Riza Suminto created IMPALA-12183:
-------------------------------------
Summary: Maintain cardinality clamping across multi-phase
aggregation
Key: IMPALA-12183
URL: https://issues.apache.org/jira/browse/IMPALA-12183
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.2.0
Reporter: Riza Suminto
Assignee: Riza Suminto
In the Impala planner, an aggregation node's cardinality is a sum of all its
aggregation class cardinality. An aggregation class cardinality is a simple
multiplication of NDVs of contributing grouping columns. Since this simple
multiplication of NDVs can be greater than the aggregation node's input
cardinality, each aggregation class cardinality is further clamped at the
aggregation node's input cardinality.
An aggregation operator can translate into a chain of multi-phase aggregation
plan nodes. The longest possible aggregation phase is as follows, from the
bottom to the top:
# FIRST
# FIRST_MERGE
# SECOND
# SECOND_MERGE
# TRANSPOSE
FIRST_MERGE aggregation maintains its aggregation class cardinality clamping at
its corresponding FIRST aggregation's input cardinality (similar relationship
between SECOND_MERGE and SECOND). However, the SECOND aggregation was clamped
at the FIRST_MERGE output cardinality instead of the FIRST input cardinality.
This cardinality mispropagation can causes cardinality explosion in the later
aggregation phase and node operator above them.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)