[
https://issues.apache.org/jira/browse/HIVE-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019277#comment-16019277
]
Gopal V commented on HIVE-16654:
--------------------------------
The performance improvement is pretty significant - {{select count(distinct
l_orderkey), max(l_orderkey), min(l_orderkey) from lineitem where
year(l_shipdate)=1998;}} went from 816s -> 198s with this patch enabled.
> Optimize a combination of avg(), sum(), count(distinct) etc
> -----------------------------------------------------------
>
> Key: HIVE-16654
> URL: https://issues.apache.org/jira/browse/HIVE-16654
> Project: Hive
> Issue Type: Bug
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
> Attachments: HIVE-16654.01.patch
>
>
> an example rewrite for q28 of tpcds is
> {code}
> (select LP as B1_LP ,CNT as B1_CNT,CNTD as B1_CNTD
> from (select sum(xc0) / sum(xc1) as LP, sum(xc1) as CNT, count(1) as
> CNTD from (select sum(ss_list_price) as xc0, count(ss_list_price) as xc1 from
> store_sales where
> ss_list_price is not null and ss_quantity between 0 and 5
> and (ss_list_price between 11 and 11+10
> or ss_coupon_amt between 460 and 460+1000
> or ss_wholesale_cost between 14 and 14+20)
> group by ss_list_price) ss0) ss1) B1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)