Aman Sinha created DRILL-602:
--------------------------------

             Summary: Query with join and group-by on join column hangs
                 Key: DRILL-602
                 URL: https://issues.apache.org/jira/browse/DRILL-602
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Aman Sinha


Following query hangs on latest master branch: 
select ps.ps_partkey, count(*) from cp.`tpch/lineitem.parquet` l, 
cp.`tpch/partsupp.parquet` ps where l.l_partkey = ps.ps_partkey and 
ps.ps_partkey = 30 group by ps.ps_partkey;

Plan looks ok: 

 ScreenPrel: rowcount = 22.5, cumulative cost = {1398.3592534906718 rows, 307.0 
cpu, 0.0 io}, id = 400
  UnionExchangePrel: rowcount = 22.5, cumulative cost = {1396.1092534906718 
rows, 304.75 cpu, 0.0 io}, id = 399
    StreamAggPrel(group=[{0}], EXPR$1=[COUNT()]): rowcount = 22.5, cumulative 
cost = {1393.8592534906718 rows, 302.5 cpu, 0.0 io}, id = 398
      SortPrel(sort0=[$0], dir0=[ASC]): rowcount = 225.0, cumulative cost = 
{1371.3592534906718 rows, 302.5 cpu, 0.0 io}, id = 397
        HashToRandomExchangePrel(dist0=[[$0]]): rowcount = 225.0, cumulative 
cost = {883.910217292274 rows, 280.0 cpu, 0.0 io}, id = 396
          ProjectPrel(ps_partkey=[$3]): rowcount = 225.0, cumulative cost = 
{861.410217292274 rows, 257.5 cpu, 0.0 io}, id = 395
            MergeJoinPrel(condition=[=($1, $3)], joinType=[inner]): rowcount = 
225.0, cumulative cost = {838.910217292274 rows, 235.0 cpu, 0.0 io}, id = 394
              SortPrel(sort0=[$1], dir0=[ASC]): rowcount = 100.0, cumulative 
cost = {478.4136148790474 rows, 121.0 cpu, 0.0 io}, id = 390
                HashToRandomExchangePrel(dist0=[[$1]]): rowcount = 100.0, 
cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 389
                  ScanPrel(table=[[cp, tpch/lineitem.parquet]]): rowcount = 
100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 247
              SortPrel(sort0=[$1], dir0=[ASC]): rowcount = 15.0, cumulative 
cost = {135.49660241322653 rows, 114.0 cpu, 0.0 io}, id = 393
                HashToRandomExchangePrel(dist0=[[$1]]): rowcount = 15.0, 
cumulative cost = {103.0 rows, 112.5 cpu, 0.0 io}, id = 392
                  FilterPrel(condition=[=(CAST($1):INTEGER NOT NULL, 30)]): 
rowcount = 15.0, cumulative cost = {101.5 rows, 111.0 cpu, 0.0 io}, id = 391
                    ScanPrel(table=[[cp, tpch/partsupp.parquet]]): rowcount = 
100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 191



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to