[
https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated HIVE-17474:
------------------------------------
Comment: was deleted
(was: After HIVE-15192, the store is converted to map join.
the logical plan will be forever
{code}
TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60]
TS[1]-FIL[64]-RS[5]-JOIN[6]
TS[2]-FIL[65]-RS[10]-JOIN[11]
TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67]-SEL[25]-GBY[26]-RS[27]-GBY[28]-RS[29]-GBY[30]-RS[31]-SEL[32]-PTF[33]-FIL[66]-SEL[34]-GBY[39]-RS[43]-JOIN[44]
TS[13]-FIL[69]-RS[18]-JOIN[19]
TS[14]-FIL[70]-RS[22]-JOIN[23]
{code}
It is reasonable the small table store is converted to map join. so close the
jira.)
> Poor Performance about subquery like DS/query70
> -----------------------------------------------
>
> Key: HIVE-17474
> URL: https://issues.apache.org/jira/browse/HIVE-17474
> Project: Hive
> Issue Type: Bug
> Reporter: liyunzhang_intel
>
> in
> [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
> {code}
> select
> sum(ss_net_profit) as total_sum
> ,s_state
> ,s_county
> ,grouping__id as lochierarchy
> , rank() over(partition by grouping__id, case when grouping__id == 2 then
> s_state end order by sum(ss_net_profit)) as rank_within_parent
> from
> store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk
> join store s on s.s_store_sk = ss.ss_store_sk
> where
> d1.d_month_seq between 1193 and 1193+11
> and s.s_state in
> ( select s_state
> from (select s_state as s_state, sum(ss_net_profit),
> rank() over ( partition by s_state order by
> sum(ss_net_profit) desc) as ranking
> from store_sales, store, date_dim
> where d_month_seq between 1193 and 1193+11
> and date_dim.d_date_sk =
> store_sales.ss_sold_date_sk
> and store.s_store_sk = store_sales.ss_store_sk
> group by s_state
> ) tmp1
> where ranking <= 5
> )
> group by s_state,s_county with rollup
> order by
> lochierarchy desc
> ,case when lochierarchy = 0 then s_state end
> ,rank_within_parent
> limit 100;
> {code}
> let's analyze the query,
> part1: it calculates the sub-query and get the result of the state which
> ss_net_profit is less than 5.
> part2: big table store_sales join small tables date_dim, store and get the
> result.
> part3: part1 join part2
> the problem is on the part3, this is common join. The cardinality of part1
> and part2 is low as there are not very different values about states(
> actually there are 30 different values in the table store). If use common
> join, big data will go to the 30 reducers.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)