[
https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-12657.
-----------------------------------
Resolution: Fixed
> Improve ProcessingCost of ScanNode and NonGroupingAggregator
> ------------------------------------------------------------
>
> Key: IMPALA-12657
> URL: https://issues.apache.org/jira/browse/IMPALA-12657
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 4.3.0
> Reporter: Riza Suminto
> Assignee: David Rorke
> Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: profile_1f4d7a679a3e12d5_4223115700000000.txt
>
>
> Several benchmark run measuring Impala scan performance indicates some
> costing improvement opportunity around ScanNode and NonGroupingAggregator.
> [^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple
> count query.
> Key takeaway:
> # There is a strong correlation between total materialized bytes (row-size *
> cardinality) with total materialized tuple time per fragment. Row
> materialization cost should be adjusted to be based on this row-sized instead
> of equal cost per scan range.
> # NonGroupingAggregator should have much lower cost that GroupingAggregator.
> In example above, the cost of NonGroupingAggregator dominates the scan
> fragment even though it only does simple counting instead of hash table
> operation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)