Riza Suminto created IMPALA-12183:
-------------------------------------

             Summary: Maintain cardinality clamping across multi-phase 
aggregation
                 Key: IMPALA-12183
                 URL: https://issues.apache.org/jira/browse/IMPALA-12183
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.2.0
            Reporter: Riza Suminto
            Assignee: Riza Suminto


In the Impala planner, an aggregation node's cardinality is a sum of all its 
aggregation class cardinality. An aggregation class cardinality is a simple 
multiplication of NDVs of contributing grouping columns. Since this simple 
multiplication of NDVs can be greater than the aggregation node's input 
cardinality, each aggregation class cardinality is further clamped at the 
aggregation node's input cardinality.

An aggregation operator can translate into a chain of multi-phase aggregation 
plan nodes. The longest possible aggregation phase is as follows, from the 
bottom to the top:
 # FIRST
 # FIRST_MERGE
 # SECOND
 # SECOND_MERGE
 # TRANSPOSE

FIRST_MERGE aggregation maintains its aggregation class cardinality clamping at 
its corresponding FIRST aggregation's input cardinality (similar relationship 
between SECOND_MERGE and SECOND). However, the SECOND aggregation was clamped 
at the FIRST_MERGE output cardinality instead of the FIRST input cardinality. 
This cardinality mispropagation can causes cardinality explosion in the later 
aggregation phase and node operator above them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to