Riza Suminto created IMPALA-12657:
-------------------------------------
Summary: Improve ProcessingCost of ScanNode and
NonGroupingAggregator
Key: IMPALA-12657
URL: https://issues.apache.org/jira/browse/IMPALA-12657
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 4.3.0
Reporter: Riza Suminto
Assignee: Riza Suminto
Fix For: Impala 4.4.0
Attachments: profile_1f4d7a679a3e12d5_4223115700000000.txt
Several benchmark run measuring Impala scan performance indicates some costing
improvement opportunity around ScanNode and NonGroupingAggregator.
[^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple
count star query.
Key takeaway:
# There is a strong correlation between total materialized bytes (row-size *
cardinality) with total materialized tuple time per fragment. Row
materialization cost should be adjusted to be based on this row-sized instead
of equal cost per scan fragment.
# NonGroupingAggregator should have much lower cost that GroupingAggregator.
In example above, the cost of NonGroupingAggregator dominates the scan fragment
even though it only does simple counting instead of hash table operation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]