Fang-Yu Rao created IMPALA-13490:
------------------------------------

             Summary: TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail 
after IMPALA-13469
                 Key: IMPALA-13490
                 URL: https://issues.apache.org/jira/browse/IMPALA-13490
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.5.0
            Reporter: Fang-Yu Rao
            Assignee: Riza Suminto


We found that testNonTpcdsDdl() in 
[TpcdsCpuCostPlannerTest.java|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java]
 could fail after IMPALA-13469 with the following error.

 

It looks like the expected value of 'segment-costs' does not match the actual 
one in the single node plan.

+*Error Message*+
{code:java}
Section PLAN of query at line 651:
create table t partitioned by (c_nationkey) sort by (c_custkey) as
select c_custkey, max(o_totalprice) as maxprice, c_nationkey
  from tpch.orders join tpch.customer on c_custkey = o_custkey
 where c_nationkey < 10
 group by c_custkey, c_nationkey

Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
Per-Host Resource Estimates: Memory=35MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB 
thread-reservation=1 runtime-filters-memory=1.00MB
|  max-parallelism=1 segment-costs=[8689789, 272154, 4822204]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
PARTITION-KEYS=(c_nationkey)]
|  partitions=25
|  output exprs: c_custkey, max(o_totalprice), c_nationkey
|  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
|
04:SORT
|  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
|  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
|  in pipelines: 04(GETNEXT), 03(OPEN)
|
03:AGGREGATE [FINALIZE]
|  output: max(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
|  in pipelines: 03(GETNEXT), 00(OPEN)
|
02:HASH JOIN [INNER JOIN]
|  hash predicates: o_custkey = c_custkey
|  fk/pk conjuncts: o_custkey = c_custkey
|  runtime filters: RF000[bloom] <- c_custkey
|  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0
|  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
|  in pipelines: 00(GETNEXT), 01(OPEN)
|
|--01:SCAN HDFS [tpch.customer]
|     HDFS partitions=1/1 files=1 size=23.08MB
|     predicates: c_nationkey < CAST(10 AS SMALLINT)
|     stored statistics:
|       table: rows=150.00K size=23.08MB
|       columns: all
|     extrapolated-rows=disabled max-scan-range-rows=150.00K
|     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
|     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
|     in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders]
   HDFS partitions=1/1 files=1 size=162.56MB
   runtime filters: RF000[bloom] -> o_custkey
   stored statistics:
     table: rows=1.50M size=162.56MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.18M
   mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
   tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
   in pipelines: 00(GETNEXT)

Expected:
Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
Per-Host Resource Estimates: Memory=35MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB 
thread-reservation=1 runtime-filters-memory=1.00MB
|  max-parallelism=1 segment-costs=[8689789, 17851, 3700630]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
PARTITION-KEYS=(c_nationkey)]
|  partitions=25
|  output exprs: c_custkey, max(o_totalprice), c_nationkey
|  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
|
04:SORT
|  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
|  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
|  in pipelines: 04(GETNEXT), 03(OPEN)
|
03:AGGREGATE [FINALIZE]
|  output: max(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
|  in pipelines: 03(GETNEXT), 00(OPEN)
|
02:HASH JOIN [INNER JOIN]
|  hash predicates: o_custkey = c_custkey
|  fk/pk conjuncts: o_custkey = c_custkey
|  runtime filters: RF000[bloom] <- c_custkey
|  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0
|  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
|  in pipelines: 00(GETNEXT), 01(OPEN)
|
|--01:SCAN HDFS [tpch.customer]
|     HDFS partitions=1/1 files=1 size=23.08MB
|     predicates: c_nationkey < CAST(10 AS SMALLINT)
|     stored statistics:
|       table: rows=150.00K size=23.08MB
|       columns: all
|     extrapolated-rows=disabled max-scan-range-rows=150.00K
|     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
|     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
|     in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders]
   HDFS partitions=1/1 files=1 size=162.56MB
   runtime filters: RF000[bloom] -> o_custkey
   stored statistics:
     table: rows=1.50M size=162.56MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.18M
   mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
   tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
   in pipelines: 00(GETNEXT)
{code}

Moreover, the expected value of 'Memory' does not match the actual one in the 
distributed plan.
{code}
Section DISTRIBUTEDPLAN of query at line 651:
create table t partitioned by (c_nationkey) sort by (c_custkey) as
select c_custkey, max(o_totalprice) as maxprice, c_nationkey
  from tpch.orders join tpch.customer on c_custkey = o_custkey
 where c_nationkey < 10
 group by c_custkey, c_nationkey

Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=35.69MB Threads=5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Per-Host Resource Estimates: Memory=66MB
F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
|  Per-Instance Resources: mem-estimate=8.01MB mem-reservation=6.00MB 
thread-reservation=1
|  max-parallelism=2 segment-costs=[316495, 4822204] cpu-comparison-result=3 
[max(2 (self) vs 3 (sum children))]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
PARTITION-KEYS=(c_nationkey)]
|  partitions=25
|  output exprs: c_custkey, max(o_totalprice), c_nationkey
|  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
|
08:SORT
|  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
|  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
|  in pipelines: 08(GETNEXT), 06(OPEN)
|
07:EXCHANGE [HASH(c_nationkey)]
|  mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
|  in pipelines: 06(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
Per-Instance Resources: mem-estimate=12.01MB mem-reservation=4.75MB 
thread-reservation=1
max-parallelism=2 segment-costs=[1394159, 382905] cpu-comparison-result=3 
[max(2 (self) vs 3 (sum children))]
06:AGGREGATE [FINALIZE]
|  output: max:merge(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=4.75MB spill-buffer=256.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
|  in pipelines: 06(GETNEXT), 00(OPEN)
|
05:EXCHANGE [HASH(c_custkey,c_nationkey)]
|  mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
|  in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB 
thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB 
thread-reservation=1
max-parallelism=2 segment-costs=[7810011, 382905] cpu-comparison-result=3 
[max(2 (self) vs 3 (sum children))]
03:AGGREGATE [STREAMING]
|  output: max(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
|  in pipelines: 00(GETNEXT)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
|  hash-table-id=00
|  hash predicates: o_custkey = c_custkey
|  fk/pk conjuncts: o_custkey = c_custkey
|  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
|  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
|  in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
|  |  Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB 
thread-reservation=1 runtime-filters-memory=1.00MB
|  |  max-parallelism=2 segment-costs=[18986]
|  JOIN BUILD
|  |  join-table-id=00 plan-id=01 cohort-id=01
|  |  build expressions: c_custkey
|  |  runtime filters: RF000[bloom] <- c_custkey
|  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 cost=15000
|  |
|  04:EXCHANGE [BROADCAST]
|  |  mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
|  |  tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
|  |  in pipelines: 01(GETNEXT)
|  |
|  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
|  Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB 
thread-reservation=1
|  max-parallelism=1 segment-costs=[865507]
|  01:SCAN HDFS [tpch.customer, RANDOM]
|     HDFS partitions=1/1 files=1 size=23.08MB
|     predicates: c_nationkey < CAST(10 AS SMALLINT)
|     stored statistics:
|       table: rows=150.00K size=23.08MB
|       columns: all
|     extrapolated-rows=disabled max-scan-range-rows=150.00K
|     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
|     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
|     in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders, RANDOM]
   HDFS partitions=1/1 files=1 size=162.56MB
   runtime filters: RF000[bloom] -> o_custkey
   stored statistics:
     table: rows=1.50M size=162.56MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.18M
   mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
   tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
   in pipelines: 00(GETNEXT)

Expected:
Max Per-Host Resource Reservation: Memory=32.88MB Threads=5
Per-Host Resource Estimates: Memory=63MB
F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
|  Per-Instance Resources: mem-estimate=6.17MB mem-reservation=6.00MB 
thread-reservation=1
|  max-parallelism=2 segment-costs=[20759, 3700630] cpu-comparison-result=3 
[max(2 (self) vs 3 (sum children))]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
PARTITION-KEYS=(c_nationkey)]
|  partitions=25
|  output exprs: c_custkey, max(o_totalprice), c_nationkey
|  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
|
08:SORT
|  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
|  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
|  in pipelines: 08(GETNEXT), 06(OPEN)
|
07:EXCHANGE [HASH(c_nationkey)]
|  mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
|  in pipelines: 06(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
Per-Instance Resources: mem-estimate=10.17MB mem-reservation=1.94MB 
thread-reservation=1
max-parallelism=2 segment-costs=[91447, 25116] cpu-comparison-result=3 [max(2 
(self) vs 3 (sum children))]
06:AGGREGATE [FINALIZE]
|  output: max:merge(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=15.00K cost=88539
|  in pipelines: 06(GETNEXT), 00(OPEN)
|
05:EXCHANGE [HASH(c_custkey,c_nationkey)]
|  mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
|  in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB 
thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB 
thread-reservation=1
max-parallelism=2 segment-costs=[7810011, 25116] cpu-comparison-result=3 [max(2 
(self) vs 3 (sum children))]
03:AGGREGATE [STREAMING]
|  output: max(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
|  in pipelines: 00(GETNEXT)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
|  hash-table-id=00
|  hash predicates: o_custkey = c_custkey
|  fk/pk conjuncts: o_custkey = c_custkey
|  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
|  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
|  in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
|  |  Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB 
thread-reservation=1 runtime-filters-memory=1.00MB
|  |  max-parallelism=2 segment-costs=[18986]
|  JOIN BUILD
|  |  join-table-id=00 plan-id=01 cohort-id=01
|  |  build expressions: c_custkey
|  |  runtime filters: RF000[bloom] <- c_custkey
|  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 cost=15000
|  |
|  04:EXCHANGE [BROADCAST]
|  |  mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
|  |  tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
|  |  in pipelines: 01(GETNEXT)
|  |
|  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
|  Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB 
thread-reservation=1
|  max-parallelism=1 segment-costs=[865507]
|  01:SCAN HDFS [tpch.customer, RANDOM]
|     HDFS partitions=1/1 files=1 size=23.08MB
|     predicates: c_nationkey < CAST(10 AS SMALLINT)
|     stored statistics:
|       table: rows=150.00K size=23.08MB
|       columns: all
|     extrapolated-rows=disabled max-scan-range-rows=150.00K
|     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
|     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
|     in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders, RANDOM]
   HDFS partitions=1/1 files=1 size=162.56MB
   runtime filters: RF000[bloom] -> o_custkey
   stored statistics:
     table: rows=1.50M size=162.56MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.18M
   mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
   tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
   in pipelines: 00(GETNEXT)
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to