[
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780025#comment-16780025
]
Vineet Garg commented on HIVE-21340:
------------------------------------
Problem is with HiveSemiJoinRule. Column pruning is occurring e.g. the plan
just before HiveSemiJoinRule is:
{code:sql}
HiveAggregate(group=[{}], agg#0=[count()])
HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not
available])
HiveProject(i_item_sk=[$0])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[perf, item]], table:alias=[item])
HiveAggregate(group=[{0}])
HiveFilter(condition=[>($2, 1)])
HiveAggregate(group=[{2, 9}], agg#0=[count()])
HiveFilter(condition=[IS NOT NULL($2)])
HiveTableScan(table=[[perf, store_sales]],
table:alias=[store_sales])
{code}
HiveSemiJoinRule rewrites the HiveJoin + HIveAggregate into HiveSemiJoin. It
does not introduce HiveProject as replacement of HiveAggregate, as a result
schema changes to whatever HiveAggregate's input is (HiveFilter in this case)
> CBO: Prune non-key columns feeding into a SemiJoin
> --------------------------------------------------
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 4.0.0
> Reporter: Gopal V
> Assignee: Vineet Garg
> Priority: Major
>
> {code}
> explain cbo
> with ss as
> (select count(1), ss_item_sk, ss_ticket_number from
> store_sales group by ss_item_sk, ss_ticket_number
> having count(1) > 1)
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}}
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
> HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
> HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000, item]],
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
> HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
> HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_10000,
> store_sales]], table:alias=[store_sales])
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)