-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
-----------------------------------------------------------

(Updated Jan. 22, 2020, 9:44 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.


Bugs: HIVE-22726
    https://issues.apache.org/jira/browse/HIVE-22726


Repository: hive-git


Description
-------

The TopN key optimizer currently uses a priority queue for keeping track of the 
largest/smallest rows. Its max size is the same as the user specified limit. 
This should be replaced a more cache line friendly array with a small (128) 
maximum size and see how much performance is gained.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b79515fcf07 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
5faa038c18d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 0786c82b7be 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 8cb48473785 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
ce6efa49192 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c 


Diff: https://reviews.apache.org/r/71995/diff/3/

Changes: https://reviews.apache.org/r/71995/diff/2-3/


Testing
-------

with the following query:


use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;

select  i_item_id,
        s_state, grouping(s_state) g_state,
        avg(ss_quantity) agg1,
        avg(ss_list_price) agg2,
        avg(ss_coupon_amt) agg3,
        avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, store, item
 where ss_sold_date_sk = d_date_sk and
       ss_item_sk = i_item_sk and
       ss_store_sk = s_store_sk and
       ss_cdemo_sk = cd_demo_sk
 group by rollup (i_item_id, s_state)
 order by i_item_id
         ,s_state
 limit 5;


Results:
  enabled:   5 rows selected (715.26 seconds)
  enabled:   5 rows selected (605.888 seconds)
  disabled:  5 rows selected (1208.168 seconds)
  disabled:  5 rows selected (1219.482 seconds)


Thanks,

Attila Magyar

Reply via email to