Quanlong Huang created IMPALA-15019:
---------------------------------------

             Summary: Calcite planner has higher memory estimation
                 Key: IMPALA-15019
                 URL: https://issues.apache.org/jira/browse/IMPALA-15019
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Quanlong Huang
            Assignee: Steve Carlin
         Attachments: row-size-comparison.txt, tpcds-q4-calcite-plan.txt, 
tpcds-q4-original-plan.txt

Comparing the EXPLAIN outputs between the original planner and calcite-planner, 
it seems the calcite planner always uses a larger row-size, which might result 
in higher memory estimation.

For instance, for the following query:
{code:sql}
EXPLAIN SELECT count(*) FROM functional.alltypes
 WHERE year=2009 AND int_col=1 AND string_col='1';{code}
The original planner uses row-size=17B in the scan node, which the 
calcite-planner uses row-size=21B.
Original planner:
{noformat}
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
| Per-Host Resource Estimates: Memory=80MB                    |
| Codegen disabled by planner                                 |
|                                                             |
| PLAN-ROOT SINK                                              |
| |                                                           |
| 03:AGGREGATE [FINALIZE]                                     |
| |  output: count:merge(*)                                   |
| |  row-size=8B cardinality=1                                |
| |                                                           |
| 02:EXCHANGE [UNPARTITIONED]                                 |
| |                                                           |
| 01:AGGREGATE                                                |
| |  output: count(*)                                         |
| |  row-size=8B cardinality=3                                |
| |                                                           |
| 00:SCAN HDFS [functional.alltypes]                          |
|    partition predicates: `year` = 2009                      |
|    HDFS partitions=12/24 files=12 size=238.68KB             |
|    predicates: int_col = 1, string_col = '1'                |
|    row-size=17B cardinality=115                             |
+-------------------------------------------------------------+{noformat}
Calcite-planner:
{noformat}
+--------------------------------------------------------------------------------------+
| Explain String                                                                
       |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3                   
       |
| Per-Host Resource Estimates: Memory=80MB                                      
       |
| Codegen disabled by planner                                                   
       |
|                                                                               
       |
| PLAN-ROOT SINK                                                                
       |
| |                                                                             
       |
| 03:AGGREGATE [FINALIZE]                                                       
       |
| |  output: count:merge()                                                      
       |
| |  row-size=8B cardinality=1                                                  
       |
| |                                                                             
       |
| 02:EXCHANGE [UNPARTITIONED]                                                   
       |
| |                                                                             
       |
| 01:AGGREGATE                                                                  
       |
| |  output: count()                                                            
       |
| |  row-size=8B cardinality=3                                                  
       |
| |                                                                             
       |
| 00:SCAN HDFS [functional.alltypes]                                            
       |
|    partition predicates: functional.alltypes.year = 2009                      
       |
|    HDFS partitions=12/24 files=12 size=238.68KB                               
       |
|    predicates: functional.alltypes.int_col = 1, 
functional.alltypes.string_col = '1' |
|    row-size=21B cardinality=115                                               
       |
+--------------------------------------------------------------------------------------+{noformat}
Also compared TPCDS-Q4 as a more complex example, the original planner has 
lower memory requirement:
{noformat}
Max Per-Host Resource Reservation: Memory=511.00MB Threads=50
Per-Host Resource Estimates: Memory=2.57GB{noformat}
The calcite-planner has higher memory:
{noformat}
Max Per-Host Resource Reservation: Memory=539.88MB Threads=50
Per-Host Resource Estimates: Memory=2.68GB{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to