[ 
https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto updated IMPALA-12657:
----------------------------------
    Description: 
Several benchmark run measuring Impala scan performance indicates some costing 
improvement opportunity around ScanNode and NonGroupingAggregator.

[^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple 
count star query.

Key takeaway:
 # There is a strong correlation between total materialized bytes (row-size * 
cardinality) with total materialized tuple time per fragment. Row 
materialization cost should be adjusted to be based on this row-sized instead 
of equal cost per scan range.
 # NonGroupingAggregator should have much lower cost that GroupingAggregator. 
In example above, the cost of NonGroupingAggregator dominates the scan fragment 
even though it only does simple counting instead of hash table operation.

  was:
Several benchmark run measuring Impala scan performance indicates some costing 
improvement opportunity around ScanNode and NonGroupingAggregator.

[^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple 
count star query.

Key takeaway:
 # There is a strong correlation between total materialized bytes (row-size * 
cardinality) with total materialized tuple time per fragment. Row 
materialization cost should be adjusted to be based on this row-sized instead 
of equal cost per scan fragment.
 # NonGroupingAggregator should have much lower cost that GroupingAggregator. 
In example above, the cost of NonGroupingAggregator dominates the scan fragment 
even though it only does simple counting instead of hash table operation.


> Improve ProcessingCost of ScanNode and NonGroupingAggregator
> ------------------------------------------------------------
>
>                 Key: IMPALA-12657
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12657
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 4.3.0
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>         Attachments: profile_1f4d7a679a3e12d5_4223115700000000.txt
>
>
> Several benchmark run measuring Impala scan performance indicates some 
> costing improvement opportunity around ScanNode and NonGroupingAggregator.
> [^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple 
> count star query.
> Key takeaway:
>  # There is a strong correlation between total materialized bytes (row-size * 
> cardinality) with total materialized tuple time per fragment. Row 
> materialization cost should be adjusted to be based on this row-sized instead 
> of equal cost per scan range.
>  # NonGroupingAggregator should have much lower cost that GroupingAggregator. 
> In example above, the cost of NonGroupingAggregator dominates the scan 
> fragment even though it only does simple counting instead of hash table 
> operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to