Quanlong Huang created IMPALA-15019:
---------------------------------------
Summary: Calcite planner has higher memory estimation
Key: IMPALA-15019
URL: https://issues.apache.org/jira/browse/IMPALA-15019
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Quanlong Huang
Assignee: Steve Carlin
Attachments: row-size-comparison.txt, tpcds-q4-calcite-plan.txt,
tpcds-q4-original-plan.txt
Comparing the EXPLAIN outputs between the original planner and calcite-planner,
it seems the calcite planner always uses a larger row-size, which might result
in higher memory estimation.
For instance, for the following query:
{code:sql}
EXPLAIN SELECT count(*) FROM functional.alltypes
WHERE year=2009 AND int_col=1 AND string_col='1';{code}
The original planner uses row-size=17B in the scan node, which the
calcite-planner uses row-size=21B.
Original planner:
{noformat}
+-------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
| Per-Host Resource Estimates: Memory=80MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | row-size=8B cardinality=1 |
| | |
| 02:EXCHANGE [UNPARTITIONED] |
| | |
| 01:AGGREGATE |
| | output: count(*) |
| | row-size=8B cardinality=3 |
| | |
| 00:SCAN HDFS [functional.alltypes] |
| partition predicates: `year` = 2009 |
| HDFS partitions=12/24 files=12 size=238.68KB |
| predicates: int_col = 1, string_col = '1' |
| row-size=17B cardinality=115 |
+-------------------------------------------------------------+{noformat}
Calcite-planner:
{noformat}
+--------------------------------------------------------------------------------------+
| Explain String
|
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3
|
| Per-Host Resource Estimates: Memory=80MB
|
| Codegen disabled by planner
|
|
|
| PLAN-ROOT SINK
|
| |
|
| 03:AGGREGATE [FINALIZE]
|
| | output: count:merge()
|
| | row-size=8B cardinality=1
|
| |
|
| 02:EXCHANGE [UNPARTITIONED]
|
| |
|
| 01:AGGREGATE
|
| | output: count()
|
| | row-size=8B cardinality=3
|
| |
|
| 00:SCAN HDFS [functional.alltypes]
|
| partition predicates: functional.alltypes.year = 2009
|
| HDFS partitions=12/24 files=12 size=238.68KB
|
| predicates: functional.alltypes.int_col = 1,
functional.alltypes.string_col = '1' |
| row-size=21B cardinality=115
|
+--------------------------------------------------------------------------------------+{noformat}
Also compared TPCDS-Q4 as a more complex example, the original planner has
lower memory requirement:
{noformat}
Max Per-Host Resource Reservation: Memory=511.00MB Threads=50
Per-Host Resource Estimates: Memory=2.57GB{noformat}
The calcite-planner has higher memory:
{noformat}
Max Per-Host Resource Reservation: Memory=539.88MB Threads=50
Per-Host Resource Estimates: Memory=2.68GB{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)