Riza Suminto created IMPALA-13644:
-------------------------------------
Summary: Generalize and move getPerInstanceNdvForCpuCosting into
AggregationNode.
Key: IMPALA-13644
URL: https://issues.apache.org/jira/browse/IMPALA-13644
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto
getPerInstanceNdvForCpuCosting is a method to estimate the number of distinct
values of exprs per fragment instance when accounting for the likelihood of
duplicate keys across fragment instances. It borrows the probabilistic model
from formula described in IMPALA-2945. This method is exclusively used by
AggregationNode only.
[https://github.com/apache/impala/blob/99529db6ad62ddc34cbfd924d7e41b1fce5b60a2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java#L634-L642]
We should move this method to AggregationNode and generalize it to accept NDV
estimate calculated at AggregationNode.computeStats() as input. The number from
computeStats should be more precise now after improvement from IMPALA-13405,
IMPALA-13526, and IMPALA-13622.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)