Riza Suminto created IMPALA-13333:
-------------------------------------
Summary: Curb memory estimation for SORT node
Key: IMPALA-13333
URL: https://issues.apache.org/jira/browse/IMPALA-13333
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Riza Suminto
High cardinality overestimation can lead to severe memory overestimation for
SORT node, even in Parallel Plan. TPC-DS Q31 and Q51 plan against synthetic 3TB
scale workload shows such huge overestimation:
[https://github.com/apache/impala/blob/ae6a3b9ec058dfea4b4f93d4828761f792f0b55e/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test#L1319-L1323]
[https://github.com/apache/impala/blob/ae6a3b9ec058dfea4b4f93d4828761f792f0b55e/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q51.test#L511-L515]
Planner should be aware to not estimate terabytes/petabytes of memory for SORT
node, knowing that SORT node has ability to spill-to-disk under memory
pressure. Planner can also take account for SORT_RUN_BYTES_LIMIT or
MAX_SORT_RUN_SIZE option value to come up with lower memory estimate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)