Riza Suminto created IMPALA-13644:
-------------------------------------

             Summary: Generalize and move getPerInstanceNdvForCpuCosting into 
AggregationNode.
                 Key: IMPALA-13644
                 URL: https://issues.apache.org/jira/browse/IMPALA-13644
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.4.0
            Reporter: Riza Suminto
            Assignee: Riza Suminto


getPerInstanceNdvForCpuCosting is a method to estimate the number of distinct 
values of exprs per fragment instance when accounting for the likelihood of 
duplicate keys across fragment instances. It borrows the probabilistic model 
from formula described in IMPALA-2945. This method is exclusively used by 
AggregationNode only.

[https://github.com/apache/impala/blob/99529db6ad62ddc34cbfd924d7e41b1fce5b60a2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java#L634-L642]
 

We should move this method to AggregationNode and generalize it to accept NDV 
estimate calculated at AggregationNode.computeStats() as input. The number from 
computeStats should be more precise now after improvement from IMPALA-13405, 
IMPALA-13526, and IMPALA-13622.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to