[
https://issues.apache.org/jira/browse/DRILL-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044226#comment-14044226
]
Aman Sinha commented on DRILL-989:
----------------------------------
Based on the Explain plan, the wrong result is likely caused by incorrect
distribution on two sides of a hash join. Note that the left side of the
HashJoin 02-08 below is distributing on 2 columns ($0 and $2). Right side of
the HashJoin is the Project 02-09 which is distributed on 1 column only based
on the HashToRandomExchange below the HashAggregate. We should either be
distributing on both columns on both sides of the join or pick one join key
(based on certain criteria) and distribute both sides on the corresponding join
key.
> 02-08 HashJoin(condition=[AND(=($0, $3), =($2,
> $4))], joinType=[left])
> 02-10 HashToRandomExchange(dist0=[[$0]],
> dist1=[[$2]])
> 04-01 Project($f1=[$0], $f2=[$1], $f3=[$2])
> 04-02 HashJoin(condition=[=($3, $4)],
> joinType=[inner])
> 04-04 Project($f1=[$0], $f2=[$2],
> $f3=[$1], $f4=[$0])
> 04-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/partsupp]],
> selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath
> [`ps_partkey`], SchemaPath [`ps_availqty`], SchemaPath [`ps_suppkey`]]]])
> 04-03 BroadcastExchange
> 07-01 HashAgg(group=[{0}])
> 07-02
> HashToRandomExchange(dist0=[[$0]])
> 09-01 Project(p_partkey=[$1])
> 09-02 SelectionVectorRemover
> 09-03 Filter(condition=[LIKE($0,
> 'antique%')])
> 09-04
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/part]],
> selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath
> [`p_name`], SchemaPath [`p_partkey`]]]])
> 02-09 Project($f0=[$0], $f10=[$1], $f20=[$2])
> 02-11 HashAgg(group=[{0, 1}], agg#0=[SUM($2)])
> 02-12 HashToRandomExchange(dist0=[[$0]])
> TPCH 20 returning wrong results
> -------------------------------
>
> Key: DRILL-989
> URL: https://issues.apache.org/jira/browse/DRILL-989
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Reporter: Ramana Inukonda Nagaraj
>
> TPCH 20 returns 0 rows, successfully executes though.
> Text version of physical plan:
> {code}
> 00-00 Screen
> 00-01 Project(s_name=[$0], s_address=[$1])
> 00-02 SingleMergeExchange(sort0=[0 ASC])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Project(s_name=[$0], s_address=[$1])
> 02-02 HashJoin(condition=[=($2, $3)], joinType=[inner])
> 02-04 HashToRandomExchange(dist0=[[$2]])
> 03-01 Project($f3=[$2], $f4=[$3], $f8=[$0])
> 03-02 HashJoin(condition=[=($1, $5)], joinType=[inner])
> 03-04 Project(s_suppkey=[$3], s_nationkey=[$2],
> s_name=[$1], s_address=[$0])
> 03-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/supplier]],
> selectionRoot=/drill/testdata/tpch-multi/supplier, columns=[SchemaPath
> [`s_suppkey`], SchemaPath [`s_nationkey`], SchemaPath [`s_name`], SchemaPath
> [`s_address`]]]])
> 03-03 BroadcastExchange
> 06-01 SelectionVectorRemover
> 06-02 Filter(condition=[=(CAST($0):CHAR(5)
> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'KENYA')])
> 06-03 Project(n_name=[$1], n_nationkey=[$0])
> 06-04 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/nation]],
> selectionRoot=/drill/testdata/tpch-multi/nation, columns=[SchemaPath
> [`n_name`], SchemaPath [`n_nationkey`]]]])
> 02-03 StreamAgg(group=[{0}])
> 02-05 Project(ps_suppkey=[$2])
> 02-06 SelectionVectorRemover
> 02-07 Filter(condition=[AND(true, >($1, CAST(*(0.5,
> $5)):ANY))])
> 02-08 HashJoin(condition=[AND(=($0, $3), =($2,
> $4))], joinType=[left])
> 02-10 HashToRandomExchange(dist0=[[$0]],
> dist1=[[$2]])
> 04-01 Project($f1=[$0], $f2=[$1], $f3=[$2])
> 04-02 HashJoin(condition=[=($3, $4)],
> joinType=[inner])
> 04-04 Project($f1=[$0], $f2=[$2],
> $f3=[$1], $f4=[$0])
> 04-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/partsupp]],
> selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath
> [`ps_partkey`], SchemaPath [`ps_availqty`], SchemaPath [`ps_suppkey`]]]])
> 04-03 BroadcastExchange
> 07-01 HashAgg(group=[{0}])
> 07-02
> HashToRandomExchange(dist0=[[$0]])
> 09-01 Project(p_partkey=[$1])
> 09-02 SelectionVectorRemover
> 09-03 Filter(condition=[LIKE($0,
> 'antique%')])
> 09-04
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/part]],
> selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath
> [`p_name`], SchemaPath [`p_partkey`]]]])
> 02-09 Project($f0=[$0], $f10=[$1], $f20=[$2])
> 02-11 HashAgg(group=[{0, 1}], agg#0=[SUM($2)])
> 02-12 HashToRandomExchange(dist0=[[$0]])
> 05-01 Project($f0=[$4], $f1=[$5],
> l_quantity=[$3])
> 05-02 HashJoin(condition=[AND(=($0, $4),
> =($1, $5))], joinType=[inner])
> 05-04 SelectionVectorRemover
> 05-05 Filter(condition=[AND(>=($2,
> 1993-01-01), <($2, +(1993-01-01, 12)))])
> 05-06 Project(l_partkey=[$2],
> l_suppkey=[$1], l_shipdate=[$3], l_quantity=[$0])
> 05-07
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/lineitem]],
> selectionRoot=/drill/testdata/tpch-multi/lineitem, columns=[SchemaPath
> [`l_partkey`], SchemaPath [`l_suppkey`], SchemaPath [`l_shipdate`],
> SchemaPath [`l_quantity`]]]])
> 05-03 BroadcastExchange
> 08-01 HashAgg(group=[{0, 1}])
> 08-02
> HashToRandomExchange(dist0=[[$0]])
> 10-01 Project($f0=[$0], $f1=[$1])
> 10-02
> HashJoin(condition=[=($2, $3)], joinType=[inner])
> 10-04 Project($f1=[$0],
> $f3=[$1], $f4=[$0])
> 10-05
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/partsupp]],
> selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath
> [`ps_partkey`], SchemaPath [`ps_suppkey`]]]])
> 10-03 BroadcastExchange
> 11-01 HashAgg(group=[{0}])
> 11-02
> HashToRandomExchange(dist0=[[$0]])
> 12-01
> Project(p_partkey=[$1])
> 12-02
> SelectionVectorRemover
> 12-03
> Filter(condition=[LIKE($0, 'antique%')])
> 12-04
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/tpch-multi/part]],
> selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath
> [`p_name`], SchemaPath [`p_partkey`]]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)