[jira] [Created] (IMPALA-6237) Mismatch in plannerTest.testJoins output

Michael Ho (JIRA) Wed, 22 Nov 2017 11:14:21 -0800

Michael Ho created IMPALA-6237:
----------------------------------

             Summary: Mismatch in plannerTest.testJoins output
                 Key: IMPALA-6237
                 URL: https://issues.apache.org/jira/browse/IMPALA-6237
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.11.0
            Reporter: Michael Ho
            Priority: Blocker



PlannerTest started failing quite consistently quite recently. The followings 
are some of the mismatch between the expected and actual outputs:

[~tianyiwang], [~alex.behm], any chance this may be related to recent changes 
in the planner ?

{noformat}
Error Message

Section DISTRIBUTEDPLAN of query:
select *
from functional.alltypesagg a
full outer join functional.alltypessmall b using (id, int_col)
right join functional.alltypesaggnonulls c on (a.id = c.id and b.string_col = 
c.string_col)
where a.day >= 6
and b.month > 2
and c.day < 3
and a.tinyint_col = 15
and b.string_col = '15'
and a.tinyint_col + b.tinyint_col < 15
and a.float_col - c.double_col < 0
and (b.double_col * c.tinyint_col > 1000 or c.tinyint_col < 1000)

Actual does not match expected result:
PLAN-ROOT SINK
|
09:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|  hash predicates: c.id = a.id, c.string_col = b.string_col
|  other predicates: a.tinyint_col = 15, b.string_col = '15', a.day >= 6, 
b.month > 2, a.float_col - c.double_col < 0, a.tinyint_col + b.tinyint_col < 
15, (b.double_col * c.tinyint_col > 1000 OR c.tinyint_col < 1000)
|
|--08:EXCHANGE [HASH(a.id,b.string_col)]
|  |
|  03:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
|  |  hash predicates: a.id = b.id, a.int_col = b.int_col
|  |
|  |--06:EXCHANGE [HASH(b.id,b.int_col)]
|  |  |
|  |  01:SCAN HDFS [functional.alltypessmall b]
|  |     partitions=2/4 files=2 size=3.17KB
|  |     predicates: b.string_col = '15'
|  |
|  05:EXCHANGE [HASH(a.id,a.int_col)]
|  |
|  00:SCAN HDFS [functional.alltypesagg a]
|     partitions=5/11 files=5 size=372.38KB
|     predicates: a.tinyint_col = 15
|
07:EXCHANGE [HASH(c.id,c.string_col)]
|
02:SCAN HDFS [functional.alltypesaggnonulls c]
   partitions=2/10 files=2 size=148.10KB

Expected:
PLAN-ROOT SINK
|
09:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
|  hash predicates: a.id = c.id, b.string_col = c.string_col
|  other predicates: a.tinyint_col = 15, b.string_col = '15', a.day >= 6, 
b.month > 2, a.float_col - c.double_col < 0, a.tinyint_col + b.tinyint_col < 
15, (b.double_col * c.tinyint_col > 1000 OR c.tinyint_col < 1000)
|  runtime filters: RF000 <- c.id, RF001 <- c.string_col
|
|--08:EXCHANGE [HASH(c.id,c.string_col)]
|  |
|  02:SCAN HDFS [functional.alltypesaggnonulls c]
|     partitions=2/10 files=2 size=148.10KB
|
07:EXCHANGE [HASH(a.id,b.string_col)]
|
03:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
|  hash predicates: a.id = b.id, a.int_col = b.int_col
|
|--06:EXCHANGE [HASH(b.id,b.int_col)]
|  |
|  01:SCAN HDFS [functional.alltypessmall b]
|     partitions=2/4 files=2 size=3.17KB
|     predicates: b.string_col = '15'
|     runtime filters: RF001 -> b.string_col
|
05:EXCHANGE [HASH(a.id,a.int_col)]
|
00:SCAN HDFS [functional.alltypesagg a]
   partitions=5/11 files=5 size=372.38KB
   predicates: a.tinyint_col = 15
   runtime filters: RF000 -> a.id

Verbose plan:
F05:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
Per-Host Resources: mem-estimate=0B mem-reservation=0B
  PLAN-ROOT SINK
  |  mem-estimate=0B mem-reservation=0B
  |
  09:EXCHANGE [UNPARTITIONED]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=2,0N,1N row-size=303B cardinality=2000

F04:PLAN FRAGMENT [HASH(c.id,c.string_col)] hosts=2 instances=2
Per-Host Resources: mem-estimate=1.94MB mem-reservation=1.94MB
  DATASTREAM SINK [FRAGMENT=F05, EXCHANGE=09, UNPARTITIONED]
  |  mem-estimate=0B mem-reservation=0B
  04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]
  |  hash predicates: c.id = a.id, c.string_col = b.string_col
  |  fk/pk conjuncts: c.id = a.id
  |  other predicates: a.tinyint_col = 15, b.string_col = '15', a.day >= 6, 
b.month > 2, a.float_col - c.double_col < 0, a.tinyint_col + b.tinyint_col < 
15, (b.double_col * c.tinyint_col > 1000 OR c.tinyint_col < 1000)
  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
  |  tuple-ids=2,0N,1N row-size=303B cardinality=2000
  |
  |--08:EXCHANGE [HASH(a.id,b.string_col)]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=0N,1N row-size=200B cardinality=561
  |
  07:EXCHANGE [HASH(c.id,c.string_col)]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=2 row-size=103B cardinality=2000

F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=07, HASH(c.id,c.string_col)]
  |  mem-estimate=0B mem-reservation=0B
  02:SCAN HDFS [functional.alltypesaggnonulls c, RANDOM]
     partitions=2/10 files=2 size=148.10KB
     stats-rows=2000 extrapolated-rows=disabled
     table stats: rows=10000 size=744.82KB
     column stats: all
     mem-estimate=32.00MB mem-reservation=0B
     tuple-ids=2 row-size=103B cardinality=2000

F03:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=1.94MB mem-reservation=1.94MB
  DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=08, HASH(a.id,b.string_col)]
  |  mem-estimate=0B mem-reservation=0B
  03:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
  |  hash predicates: a.id = b.id, a.int_col = b.int_col
  |  fk/pk conjuncts: a.id = b.id, a.int_col = b.int_col
  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
  |  tuple-ids=0N,1N row-size=200B cardinality=561
  |
  |--06:EXCHANGE [HASH(b.id,b.int_col)]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=1 row-size=97B cardinality=5
  |
  05:EXCHANGE [HASH(a.id,a.int_col)]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=0 row-size=103B cardinality=556

F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=48.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, HASH(a.id,a.int_col)]
  |  mem-estimate=0B mem-reservation=0B
  00:SCAN HDFS [functional.alltypesagg a, RANDOM]
     partitions=5/11 files=5 size=372.38KB
     predicates: a.tinyint_col = 15
     stats-rows=5000 extrapolated-rows=disabled
     table stats: rows=11000 size=814.73KB
     column stats: all
     parquet dictionary predicates: a.tinyint_col = 15
     mem-estimate=48.00MB mem-reservation=0B
     tuple-ids=0 row-size=103B cardinality=556

F02:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, HASH(b.id,b.int_col)]
  |  mem-estimate=0B mem-reservation=0B
  01:SCAN HDFS [functional.alltypessmall b, RANDOM]
     partitions=2/4 files=2 size=3.17KB
     predicates: b.string_col = '15'
     stats-rows=50 extrapolated-rows=disabled
     table stats: rows=100 size=6.32KB
     column stats: all
     parquet dictionary predicates: b.string_col = '15'
     mem-estimate=32.00MB mem-reservation=0B
     tuple-ids=1 row-size=97B cardinality=5
Stacktrace

java.lang.AssertionError: 
Section DISTRIBUTEDPLAN of query:
select *
from functional.alltypesagg a
full outer join functional.alltypessmall b using (id, int_col)
right join functional.alltypesaggnonulls c on (a.id = c.id and b.string_col = 
c.string_col)
where a.day >= 6
and b.month > 2
and c.day < 3
and a.tinyint_col = 15
and b.string_col = '15'
and a.tinyint_col + b.tinyint_col < 15
and a.float_col - c.double_col < 0
and (b.double_col * c.tinyint_col > 1000 or c.tinyint_col < 1000)

Actual does not match expected result:
PLAN-ROOT SINK
|
09:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|  hash predicates: c.id = a.id, c.string_col = b.string_col
|  other predicates: a.tinyint_col = 15, b.string_col = '15', a.day >= 6, 
b.month > 2, a.float_col - c.double_col < 0, a.tinyint_col + b.tinyint_col < 
15, (b.double_col * c.tinyint_col > 1000 OR c.tinyint_col < 1000)
|
|--08:EXCHANGE [HASH(a.id,b.string_col)]
|  |
|  03:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
|  |  hash predicates: a.id = b.id, a.int_col = b.int_col
|  |
|  |--06:EXCHANGE [HASH(b.id,b.int_col)]
|  |  |
|  |  01:SCAN HDFS [functional.alltypessmall b]
|  |     partitions=2/4 files=2 size=3.17KB
|  |     predicates: b.string_col = '15'
|  |
|  05:EXCHANGE [HASH(a.id,a.int_col)]
|  |
|  00:SCAN HDFS [functional.alltypesagg a]
|     partitions=5/11 files=5 size=372.38KB
|     predicates: a.tinyint_col = 15
|
07:EXCHANGE [HASH(c.id,c.string_col)]
|
02:SCAN HDFS [functional.alltypesaggnonulls c]
   partitions=2/10 files=2 size=148.10KB

Expected:
PLAN-ROOT SINK
|
09:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
|  hash predicates: a.id = c.id, b.string_col = c.string_col
|  other predicates: a.tinyint_col = 15, b.string_col = '15', a.day >= 6, 
b.month > 2, a.float_col - c.double_col < 0, a.tinyint_col + b.tinyint_col < 
15, (b.double_col * c.tinyint_col > 1000 OR c.tinyint_col < 1000)
|  runtime filters: RF000 <- c.id, RF001 <- c.string_col
|
|--08:EXCHANGE [HASH(c.id,c.string_col)]
|  |
|  02:SCAN HDFS [functional.alltypesaggnonulls c]
|     partitions=2/10 files=2 size=148.10KB
|
07:EXCHANGE [HASH(a.id,b.string_col)]
|
03:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
|  hash predicates: a.id = b.id, a.int_col = b.int_col
|
|--06:EXCHANGE [HASH(b.id,b.int_col)]
|  |
|  01:SCAN HDFS [functional.alltypessmall b]
|     partitions=2/4 files=2 size=3.17KB
|     predicates: b.string_col = '15'
|     runtime filters: RF001 -> b.string_col
|
05:EXCHANGE [HASH(a.id,a.int_col)]
|
00:SCAN HDFS [functional.alltypesagg a]
   partitions=5/11 files=5 size=372.38KB
   predicates: a.tinyint_col = 15
   runtime filters: RF000 -> a.id

Verbose plan:
F05:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
Per-Host Resources: mem-estimate=0B mem-reservation=0B
  PLAN-ROOT SINK
  |  mem-estimate=0B mem-reservation=0B
  |
  09:EXCHANGE [UNPARTITIONED]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=2,0N,1N row-size=303B cardinality=2000

F04:PLAN FRAGMENT [HASH(c.id,c.string_col)] hosts=2 instances=2
Per-Host Resources: mem-estimate=1.94MB mem-reservation=1.94MB
  DATASTREAM SINK [FRAGMENT=F05, EXCHANGE=09, UNPARTITIONED]
  |  mem-estimate=0B mem-reservation=0B
  04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]
  |  hash predicates: c.id = a.id, c.string_col = b.string_col
  |  fk/pk conjuncts: c.id = a.id
  |  other predicates: a.tinyint_col = 15, b.string_col = '15', a.day >= 6, 
b.month > 2, a.float_col - c.double_col < 0, a.tinyint_col + b.tinyint_col < 
15, (b.double_col * c.tinyint_col > 1000 OR c.tinyint_col < 1000)
  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
  |  tuple-ids=2,0N,1N row-size=303B cardinality=2000
  |
  |--08:EXCHANGE [HASH(a.id,b.string_col)]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=0N,1N row-size=200B cardinality=561
  |
  07:EXCHANGE [HASH(c.id,c.string_col)]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=2 row-size=103B cardinality=2000

F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=07, HASH(c.id,c.string_col)]
  |  mem-estimate=0B mem-reservation=0B
  02:SCAN HDFS [functional.alltypesaggnonulls c, RANDOM]
     partitions=2/10 files=2 size=148.10KB
     stats-rows=2000 extrapolated-rows=disabled
     table stats: rows=10000 size=744.82KB
     column stats: all
     mem-estimate=32.00MB mem-reservation=0B
     tuple-ids=2 row-size=103B cardinality=2000

F03:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=1.94MB mem-reservation=1.94MB
  DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=08, HASH(a.id,b.string_col)]
  |  mem-estimate=0B mem-reservation=0B
  03:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
  |  hash predicates: a.id = b.id, a.int_col = b.int_col
  |  fk/pk conjuncts: a.id = b.id, a.int_col = b.int_col
  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
  |  tuple-ids=0N,1N row-size=200B cardinality=561
  |
  |--06:EXCHANGE [HASH(b.id,b.int_col)]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=1 row-size=97B cardinality=5
  |
  05:EXCHANGE [HASH(a.id,a.int_col)]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=0 row-size=103B cardinality=556

F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=48.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, HASH(a.id,a.int_col)]
  |  mem-estimate=0B mem-reservation=0B
  00:SCAN HDFS [functional.alltypesagg a, RANDOM]
     partitions=5/11 files=5 size=372.38KB
     predicates: a.tinyint_col = 15
     stats-rows=5000 extrapolated-rows=disabled
     table stats: rows=11000 size=814.73KB
     column stats: all
     parquet dictionary predicates: a.tinyint_col = 15
     mem-estimate=48.00MB mem-reservation=0B
     tuple-ids=0 row-size=103B cardinality=556

F02:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, HASH(b.id,b.int_col)]
  |  mem-estimate=0B mem-reservation=0B
  01:SCAN HDFS [functional.alltypessmall b, RANDOM]
     partitions=2/4 files=2 size=3.17KB
     predicates: b.string_col = '15'
     stats-rows=50 extrapolated-rows=disabled
     table stats: rows=100 size=6.32KB
     column stats: all
     parquet dictionary predicates: b.string_col = '15'
     mem-estimate=32.00MB mem-reservation=0B
     tuple-ids=1 row-size=97B cardinality=5

        at org.junit.Assert.fail(Assert.java:88)
        at 
org.apache.impala.planner.PlannerTestBase.runPlannerTestFile(PlannerTestBase.java:770)
        at 
org.apache.impala.planner.PlannerTestBase.runPlannerTestFile(PlannerTestBase.java:775)
        at org.apache.impala.planner.PlannerTest.testJoins(PlannerTest.java:127)
{noformat}

{noformat}
Error Message

Section DISTRIBUTEDPLAN of query:
select
  sum(l_extendedprice) / 7.0 as avg_yearly
from
  customer.c_orders.o_lineitems l,
  part p
where
  p_partkey = l_partkey
  and p_brand = 'Brand#23'
  and p_container = 'MED BOX'
  and l_quantity < (
    select
      0.2 * avg(l_quantity)
    from
      customer.c_orders.o_lineitems l
    where
      l_partkey = p_partkey
  )

Actual does not match expected result:
PLAN-ROOT SINK
|
12:AGGREGATE [FINALIZE]
|  output: sum:merge(l_extendedprice)
|
11:EXCHANGE [UNPARTITIONED]
|
06:AGGREGATE
|  output: sum(l_extendedprice)
|
05:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|  hash predicates: p_partkey = l_partkey
|  other join predicates: l_quantity < 0.2 * avg(l_quantity)
|  runtime filters: RF000 <- l_partkey
|
|--10:EXCHANGE [BROADCAST]
|  |
|  09:AGGREGATE [FINALIZE]
|  |  output: avg:merge(l_quantity)
|  |  group by: l_partkey
|  |
|  08:EXCHANGE [HASH(l_partkey)]
|  |
|  03:AGGREGATE [STREAMING]
|  |  output: avg(l_quantity)
|  |  group by: l_partkey
|  |
|  02:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
|     partitions=1/1 files=4 size=292.36MB
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: l_partkey = p_partkey
|  runtime filters: RF002 <- p_partkey
|
|--07:EXCHANGE [BROADCAST]
|  |
|  01:SCAN HDFS [tpch_nested_parquet.part p]
|     partitions=1/1 files=1 size=6.23MB
|     predicates: p_container = 'MED BOX', p_brand = 'Brand#23'
|     runtime filters: RF000 -> p_partkey
|
00:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
   partitions=1/1 files=4 size=292.36MB
   runtime filters: RF000 -> l.l_partkey, RF002 -> l_partkey

Expected:
PLAN-ROOT SINK
|
12:AGGREGATE [FINALIZE]
|  output: sum:merge(l_extendedprice)
|
11:EXCHANGE [UNPARTITIONED]
|
06:AGGREGATE
|  output: sum(l_extendedprice)
|
05:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
|  hash predicates: p_partkey = l_partkey
|  other join predicates: l_quantity < 0.2 * avg(l_quantity)
|  runtime filters: RF000 <- l_partkey
|
|--09:AGGREGATE [FINALIZE]
|  |  output: avg:merge(l_quantity)
|  |  group by: l_partkey
|  |
|  08:EXCHANGE [HASH(l_partkey)]
|  |
|  03:AGGREGATE [STREAMING]
|  |  output: avg(l_quantity)
|  |  group by: l_partkey
|  |
|  02:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
|     partitions=1/1 files=4 size=292.36MB
|
10:EXCHANGE [HASH(p_partkey)]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: l_partkey = p_partkey
|  runtime filters: RF002 <- p_partkey
|
|--07:EXCHANGE [BROADCAST]
|  |
|  01:SCAN HDFS [tpch_nested_parquet.part p]
|     partitions=1/1 files=1 size=6.24MB
|     predicates: p_container = 'MED BOX', p_brand = 'Brand#23'
|     runtime filters: RF000 -> p_partkey
|
00:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
   partitions=1/1 files=4 size=292.36MB
   runtime filters: RF000 -> l.l_partkey, RF002 -> l_partkey

Verbose plan:
F04:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B
  PLAN-ROOT SINK
  |  mem-estimate=0B mem-reservation=0B
  |
  12:AGGREGATE [FINALIZE]
  |  output: sum:merge(l_extendedprice)
  |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB
  |  tuple-ids=6 row-size=16B cardinality=1
  |
  11:EXCHANGE [UNPARTITIONED]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=6 row-size=16B cardinality=1

F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=527.71MB mem-reservation=35.94MB
  DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=11, UNPARTITIONED]
  |  mem-estimate=0B mem-reservation=0B
  06:AGGREGATE
  |  output: sum(l_extendedprice)
  |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB
  |  tuple-ids=6 row-size=16B cardinality=1
  |
  05:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
  |  hash predicates: p_partkey = l_partkey
  |  other join predicates: l_quantity < 0.2 * avg(l_quantity)
  |  runtime filters: RF000[bloom] <- l_partkey
  |  mem-estimate=251.77MB mem-reservation=34.00MB spill-buffer=2.00MB
  |  tuple-ids=0,1 row-size=80B cardinality=15000000
  |
  |--10:EXCHANGE [BROADCAST]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=4 row-size=16B cardinality=15000000
  |
  04:HASH JOIN [INNER JOIN, BROADCAST]
  |  hash predicates: l_partkey = p_partkey
  |  fk/pk conjuncts: assumed fk/pk
  |  runtime filters: RF002[bloom] <- p_partkey
  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
  |  tuple-ids=0,1 row-size=80B cardinality=15000000
  |
  |--07:EXCHANGE [BROADCAST]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=1 row-size=56B cardinality=1000
  |
  00:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l, RANDOM]
     partitions=1/1 files=4 size=292.36MB
     runtime filters: RF000[bloom] -> l.l_partkey, RF002[bloom] -> l_partkey
     stats-rows=150000 extrapolated-rows=disabled
     table stats: rows=150000 size=292.36MB
     column stats: all
     mem-estimate=264.00MB mem-reservation=0B
     tuple-ids=0 row-size=24B cardinality=15000000

F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=48.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=07, BROADCAST]
  |  mem-estimate=0B mem-reservation=0B
  01:SCAN HDFS [tpch_nested_parquet.part p, RANDOM]
     partitions=1/1 files=1 size=6.23MB
     predicates: p_container = 'MED BOX', p_brand = 'Brand#23'
     runtime filters: RF000[bloom] -> p_partkey
     stats-rows=200000 extrapolated-rows=disabled
     table stats: rows=200000 size=6.23MB
     column stats: all
     parquet statistics predicates: p_container = 'MED BOX', p_brand = 
'Brand#23'
     parquet dictionary predicates: p_container = 'MED BOX', p_brand = 
'Brand#23'
     mem-estimate=48.00MB mem-reservation=0B
     tuple-ids=1 row-size=56B cardinality=1000

F03:PLAN FRAGMENT [HASH(l_partkey)] hosts=2 instances=2
Per-Host Resources: mem-estimate=128.00MB mem-reservation=34.00MB
  DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=10, BROADCAST]
  |  mem-estimate=0B mem-reservation=0B
  09:AGGREGATE [FINALIZE]
  |  output: avg:merge(l_quantity)
  |  group by: l_partkey
  |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB
  |  tuple-ids=4 row-size=16B cardinality=15000000
  |
  08:EXCHANGE [HASH(l_partkey)]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=3 row-size=16B cardinality=15000000

F02:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=304.00MB mem-reservation=34.00MB
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=08, HASH(l_partkey)]
  |  mem-estimate=0B mem-reservation=0B
  03:AGGREGATE [STREAMING]
  |  output: avg(l_quantity)
  |  group by: l_partkey
  |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB
  |  tuple-ids=3 row-size=16B cardinality=15000000
  |
  02:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l, RANDOM]
     partitions=1/1 files=4 size=292.36MB
     stats-rows=150000 extrapolated-rows=disabled
     table stats: rows=150000 size=292.36MB
     column stats: all
     mem-estimate=176.00MB mem-reservation=0B
     tuple-ids=2 row-size=16B cardinality=15000000
Stacktrace

java.lang.AssertionError: 
Section DISTRIBUTEDPLAN of query:
select
  sum(l_extendedprice) / 7.0 as avg_yearly
from
  customer.c_orders.o_lineitems l,
  part p
where
  p_partkey = l_partkey
  and p_brand = 'Brand#23'
  and p_container = 'MED BOX'
  and l_quantity < (
    select
      0.2 * avg(l_quantity)
    from
      customer.c_orders.o_lineitems l
    where
      l_partkey = p_partkey
  )

Actual does not match expected result:
PLAN-ROOT SINK
|
12:AGGREGATE [FINALIZE]
|  output: sum:merge(l_extendedprice)
|
11:EXCHANGE [UNPARTITIONED]
|
06:AGGREGATE
|  output: sum(l_extendedprice)
|
05:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|  hash predicates: p_partkey = l_partkey
|  other join predicates: l_quantity < 0.2 * avg(l_quantity)
|  runtime filters: RF000 <- l_partkey
|
|--10:EXCHANGE [BROADCAST]
|  |
|  09:AGGREGATE [FINALIZE]
|  |  output: avg:merge(l_quantity)
|  |  group by: l_partkey
|  |
|  08:EXCHANGE [HASH(l_partkey)]
|  |
|  03:AGGREGATE [STREAMING]
|  |  output: avg(l_quantity)
|  |  group by: l_partkey
|  |
|  02:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
|     partitions=1/1 files=4 size=292.36MB
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: l_partkey = p_partkey
|  runtime filters: RF002 <- p_partkey
|
|--07:EXCHANGE [BROADCAST]
|  |
|  01:SCAN HDFS [tpch_nested_parquet.part p]
|     partitions=1/1 files=1 size=6.23MB
|     predicates: p_container = 'MED BOX', p_brand = 'Brand#23'
|     runtime filters: RF000 -> p_partkey
|
00:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
   partitions=1/1 files=4 size=292.36MB
   runtime filters: RF000 -> l.l_partkey, RF002 -> l_partkey

Expected:
PLAN-ROOT SINK
|
12:AGGREGATE [FINALIZE]
|  output: sum:merge(l_extendedprice)
|
11:EXCHANGE [UNPARTITIONED]
|
06:AGGREGATE
|  output: sum(l_extendedprice)
|
05:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
|  hash predicates: p_partkey = l_partkey
|  other join predicates: l_quantity < 0.2 * avg(l_quantity)
|  runtime filters: RF000 <- l_partkey
|
|--09:AGGREGATE [FINALIZE]
|  |  output: avg:merge(l_quantity)
|  |  group by: l_partkey
|  |
|  08:EXCHANGE [HASH(l_partkey)]
|  |
|  03:AGGREGATE [STREAMING]
|  |  output: avg(l_quantity)
|  |  group by: l_partkey
|  |
|  02:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
|     partitions=1/1 files=4 size=292.36MB
|
10:EXCHANGE [HASH(p_partkey)]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: l_partkey = p_partkey
|  runtime filters: RF002 <- p_partkey
|
|--07:EXCHANGE [BROADCAST]
|  |
|  01:SCAN HDFS [tpch_nested_parquet.part p]
|     partitions=1/1 files=1 size=6.24MB
|     predicates: p_container = 'MED BOX', p_brand = 'Brand#23'
|     runtime filters: RF000 -> p_partkey
|
00:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l]
   partitions=1/1 files=4 size=292.36MB
   runtime filters: RF000 -> l.l_partkey, RF002 -> l_partkey

Verbose plan:
F04:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B
  PLAN-ROOT SINK
  |  mem-estimate=0B mem-reservation=0B
  |
  12:AGGREGATE [FINALIZE]
  |  output: sum:merge(l_extendedprice)
  |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB
  |  tuple-ids=6 row-size=16B cardinality=1
  |
  11:EXCHANGE [UNPARTITIONED]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=6 row-size=16B cardinality=1

F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=527.71MB mem-reservation=35.94MB
  DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=11, UNPARTITIONED]
  |  mem-estimate=0B mem-reservation=0B
  06:AGGREGATE
  |  output: sum(l_extendedprice)
  |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB
  |  tuple-ids=6 row-size=16B cardinality=1
  |
  05:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
  |  hash predicates: p_partkey = l_partkey
  |  other join predicates: l_quantity < 0.2 * avg(l_quantity)
  |  runtime filters: RF000[bloom] <- l_partkey
  |  mem-estimate=251.77MB mem-reservation=34.00MB spill-buffer=2.00MB
  |  tuple-ids=0,1 row-size=80B cardinality=15000000
  |
  |--10:EXCHANGE [BROADCAST]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=4 row-size=16B cardinality=15000000
  |
  04:HASH JOIN [INNER JOIN, BROADCAST]
  |  hash predicates: l_partkey = p_partkey
  |  fk/pk conjuncts: assumed fk/pk
  |  runtime filters: RF002[bloom] <- p_partkey
  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
  |  tuple-ids=0,1 row-size=80B cardinality=15000000
  |
  |--07:EXCHANGE [BROADCAST]
  |     mem-estimate=0B mem-reservation=0B
  |     tuple-ids=1 row-size=56B cardinality=1000
  |
  00:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l, RANDOM]
     partitions=1/1 files=4 size=292.36MB
     runtime filters: RF000[bloom] -> l.l_partkey, RF002[bloom] -> l_partkey
     stats-rows=150000 extrapolated-rows=disabled
     table stats: rows=150000 size=292.36MB
     column stats: all
     mem-estimate=264.00MB mem-reservation=0B
     tuple-ids=0 row-size=24B cardinality=15000000

F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=48.00MB mem-reservation=0B
  DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=07, BROADCAST]
  |  mem-estimate=0B mem-reservation=0B
  01:SCAN HDFS [tpch_nested_parquet.part p, RANDOM]
     partitions=1/1 files=1 size=6.23MB
     predicates: p_container = 'MED BOX', p_brand = 'Brand#23'
     runtime filters: RF000[bloom] -> p_partkey
     stats-rows=200000 extrapolated-rows=disabled
     table stats: rows=200000 size=6.23MB
     column stats: all
     parquet statistics predicates: p_container = 'MED BOX', p_brand = 
'Brand#23'
     parquet dictionary predicates: p_container = 'MED BOX', p_brand = 
'Brand#23'
     mem-estimate=48.00MB mem-reservation=0B
     tuple-ids=1 row-size=56B cardinality=1000

F03:PLAN FRAGMENT [HASH(l_partkey)] hosts=2 instances=2
Per-Host Resources: mem-estimate=128.00MB mem-reservation=34.00MB
  DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=10, BROADCAST]
  |  mem-estimate=0B mem-reservation=0B
  09:AGGREGATE [FINALIZE]
  |  output: avg:merge(l_quantity)
  |  group by: l_partkey
  |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB
  |  tuple-ids=4 row-size=16B cardinality=15000000
  |
  08:EXCHANGE [HASH(l_partkey)]
     mem-estimate=0B mem-reservation=0B
     tuple-ids=3 row-size=16B cardinality=15000000

F02:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=304.00MB mem-reservation=34.00MB
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=08, HASH(l_partkey)]
  |  mem-estimate=0B mem-reservation=0B
  03:AGGREGATE [STREAMING]
  |  output: avg(l_quantity)
  |  group by: l_partkey
  |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB
  |  tuple-ids=3 row-size=16B cardinality=15000000
  |
  02:SCAN HDFS [tpch_nested_parquet.customer.c_orders.o_lineitems l, RANDOM]
     partitions=1/1 files=4 size=292.36MB
     stats-rows=150000 extrapolated-rows=disabled
     table stats: rows=150000 size=292.36MB
     column stats: all
     mem-estimate=176.00MB mem-reservation=0B
     tuple-ids=2 row-size=16B cardinality=15000000

        at org.junit.Assert.fail(Assert.java:88)
        at 
org.apache.impala.planner.PlannerTestBase.runPlannerTestFile(PlannerTestBase.java:770)
        at 
org.apache.impala.planner.PlannerTestBase.runPlannerTestFile(PlannerTestBase.java:779)
        at 
org.apache.impala.planner.PlannerTest.testTpchNested(PlannerTest.java:245)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (IMPALA-6237) Mismatch in plannerTest.testJoins output

Reply via email to