[
https://issues.apache.org/jira/browse/HIVE-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161477#comment-14161477
]
Mostafa Mokhtar commented on HIVE-8280:
---------------------------------------
The issue is not yet fixed as PK side always returns selectivity of 1 due to
the following code :
{code}
if (pkSide == 1) {
FKSideInfo fkInfo = new FKSideInfo(leftRowCount,
leftNDV);
PKSideInfo pkInfo = new PKSideInfo(rightRowCount,
rightNDV,
joinRel.getJoinType().generatesNullsOnLeft() ? 1.0 :
isPKSideSimpleTree ? RelMetadataQuery.getSelectivity(right,
rightPred) : 1.0);
{code}
When a filter is applied on the PK side isPKSideSimpleTree is false.
Logs below show that the PK side has a row count of 19.27 but selectivity is 1.
{code}
HiveProjectRel(ss_customer_sk=[$1], ss_item_sk=[$0],
ss_ticket_number=[$3]): rowcount = 5.50076554E8, cumulative cost =
{5.500765732727273E8 rows, 0.0 cpu, 0.0 io}, id = 170
HiveProjectRel(ss_item_sk=[$0], ss_customer_sk=[$1],
ss_store_sk=[$2], ss_ticket_number=[$3], s_store_sk=[$4], s_market_id=[$5]):
rowcount = 5.50076554E8, cumulative cost = {5.500765732727273E8 rows, 0.0 cpu,
0.0 io}, id = 212
HiveJoinRel(condition=[=($4, $2)], joinType=[inner]): rowcount =
5.50076554E8, cumulative cost = {5.500765732727273E8 rows, 0.0 cpu, 0.0 io}, id
= 207
HiveProjectRel(ss_item_sk=[$1], ss_customer_sk=[$2],
ss_store_sk=[$6], ss_ticket_number=[$8]): rowcount = 5.50076554E8, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 162
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.store_sales]]):
rowcount = 5.50076554E8, cumulative cost = {0}, id = 6
HiveProjectRel(s_store_sk=[$0], s_market_id=[$10]): rowcount =
19.272727272727273, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 166
HiveFilterRel(condition=[=($10, 4)]): rowcount =
19.272727272727273, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 164
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.store]]): rowcount
= 212.0, cumulative cost = {0}, id = 5
{code}
> CBO : When filter is applied on dimension table PK/FK code path is not in
> effect.
> ---------------------------------------------------------------------------------
>
> Key: HIVE-8280
> URL: https://issues.apache.org/jira/browse/HIVE-8280
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Harish Butani
> Fix For: 0.14.0
>
> Attachments: HIVE-8280.1.patch, HIVE-8280.2.patch
>
>
> When a filter is applied on PK side joins don't qualify as PK/FK join.
> In getUniqueKeys when a filter is applied on the table the child is no
> longer a table scan.
> {code}
> public Set<BitSet> getUniqueKeys(ProjectRelBase rel, boolean ignoreNulls) {
> RelNode child = rel.getChild();
> if (!(child instanceof HiveTableScanRel)) {
> Function<RelNode, Metadata> fn = RelMdUniqueKeys.SOURCE.apply(
> rel.getClass(), BuiltInMetadata.UniqueKeys.class);
> return ((BuiltInMetadata.UniqueKeys) fn.apply(rel))
> .getUniqueKeys(ignoreNulls);
> }
> {code}
> Repro
> {code}
> with ss as
> (select
> ss_customer_sk, ss_item_sk, ss_ticket_number
> from
> store_sales,
> store
> where
> s_store_sk = ss_store_sk
> and s_market_id = 4),
> sr as
> (select sr_customer_sk,sr_item_sk ,sr_ticket_number from store_returns, store
> where s_store_sk = sr_store_sk and s_market_id=4)
> select
> count(*)
> from
> ss,
> sr
> where
> ss_customer_sk = sr_customer_sk
> and ss_item_sk = sr_item_sk
> and ss_ticket_number = sr_ticket_number;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)