Riza Suminto created IMPALA-12657:
-------------------------------------

             Summary: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
                 Key: IMPALA-12657
                 URL: https://issues.apache.org/jira/browse/IMPALA-12657
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 4.3.0
            Reporter: Riza Suminto
            Assignee: Riza Suminto
             Fix For: Impala 4.4.0
         Attachments: profile_1f4d7a679a3e12d5_4223115700000000.txt

Several benchmark run measuring Impala scan performance indicates some costing 
improvement opportunity around ScanNode and NonGroupingAggregator.

[^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple 
count star query.

Key takeaway:
 # There is a strong correlation between total materialized bytes (row-size * 
cardinality) with total materialized tuple time per fragment. Row 
materialization cost should be adjusted to be based on this row-sized instead 
of equal cost per scan fragment.
 # NonGroupingAggregator should have much lower cost that GroupingAggregator. 
In example above, the cost of NonGroupingAggregator dominates the scan fragment 
even though it only does simple counting instead of hash table operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to