Riza Suminto created IMPALA-13333:
-------------------------------------

             Summary: Curb memory estimation for SORT node
                 Key: IMPALA-13333
                 URL: https://issues.apache.org/jira/browse/IMPALA-13333
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Riza Suminto


High cardinality overestimation can lead to severe memory overestimation for 
SORT node, even in Parallel Plan. TPC-DS Q31 and Q51 plan against synthetic 3TB 
scale workload shows such huge overestimation:

[https://github.com/apache/impala/blob/ae6a3b9ec058dfea4b4f93d4828761f792f0b55e/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test#L1319-L1323]

[https://github.com/apache/impala/blob/ae6a3b9ec058dfea4b4f93d4828761f792f0b55e/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q51.test#L511-L515]

Planner should be aware to not estimate terabytes/petabytes of memory for SORT 
node, knowing that SORT node has ability to spill-to-disk under memory 
pressure. Planner can also take account for SORT_RUN_BYTES_LIMIT or 
MAX_SORT_RUN_SIZE option value to come up with lower memory estimate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to