Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue

Gopal V Tue, 21 Jan 2020 11:05:12 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/#review219348
-----------------------------------------------------------





ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java
Line 42 (original), 44 (patched)
<https://reviews.apache.org/r/71995/#comment307492>

    Add a counter for metrics of this


- Gopal V


On Jan. 14, 2020, 3:38 p.m., Attila Magyar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71995/
> -----------------------------------------------------------
> 
> (Updated Jan. 14, 2020, 3:38 p.m.)
> 
> 
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.
> 
> 
> Bugs: HIVE-22726
>     https://issues.apache.org/jira/browse/HIVE-22726
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The TopN key optimizer currently uses a priority queue for keeping track of 
> the largest/smallest rows. Its max size is the same as the user specified 
> limit. This should be replaced a more cache line friendly array with a small 
> (128) maximum size and see how much performance is gained.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e7724f9084f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
> 5faa038c18d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 
> ce6efa49192 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c 
> 
> 
> Diff: https://reviews.apache.org/r/71995/diff/1/
> 
> 
> Testing
> -------
> 
> with the following query:
> 
> 
> use tpcds_bin_partitioned_orc_100;
> set hive.optimize.topnkey=true;
> set hive.optimize.topnkey.max=5;
> 
> select  i_item_id,
>         s_state, grouping(s_state) g_state,
>         avg(ss_quantity) agg1,
>         avg(ss_list_price) agg2,
>         avg(ss_coupon_amt) agg3,
>         avg(ss_sales_price) agg4
>  from store_sales, customer_demographics, date_dim, store, item
>  where ss_sold_date_sk = d_date_sk and
>        ss_item_sk = i_item_sk and
>        ss_store_sk = s_store_sk and
>        ss_cdemo_sk = cd_demo_sk
>  group by rollup (i_item_id, s_state)
>  order by i_item_id
>          ,s_state
>  limit 5;
> 
> 
> Results:
>   enabled:   5 rows selected (715.26 seconds)
>   enabled:   5 rows selected (605.888 seconds)
>   disabled:  5 rows selected (1208.168 seconds)
>   disabled:  5 rows selected (1219.482 seconds)
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

Re: Review Request 71995: TopN Key optimizer should use array instead of priority queue

Reply via email to