[ 
https://issues.apache.org/jira/browse/IMPALA-14594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-14594.
------------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

> Grouping pre-agg's estimated input cardinality is not scaled by the number of 
> threads
> -------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14594
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14594
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> Grouping pre-aggs decide whether to keep expanding based on whether the 
> aggregation is achieving reduction. It has two measurements: Current 
> reduction is (# input rows - # rows returned passthrough) / # hash table 
> entries. Predicted total reduction is given by this formula (taken from 
> [be/src/exec/grouping-aggregator.cc|https://github.com/apache/impala/blob/4be5fd8896dcd445a6379bdcda4bdcf318f24511/be/src/exec/grouping-aggregator.cc#L376-L380]):
> Extrapolate the current reduction factor (r) using the formula R = 1 + (N/n) 
> * (r - 1), where R is the reduction factor over the full input data set, N is 
> the number of input rows, excluding passed-through rows, and n is the number 
> of rows inserted or merged into the hash tables. This is a very rough 
> approximation but is good enough to be useful.
> The predicted total reduction formula is sensitive to the estimated input 
> cardinality. A fragment instance is only going to be processing a fraction of 
> the input rows, so the estimated input cardinality should be scaled down to 
> the number of input rows expected to be processed by this fragment instance 
> (i.e. the total value should be divided by the number of fragment instances). 
> That is currently not happening, so the estimated reduction is significantly 
> higher than it should be. This causes pre-aggs to expand more aggressively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to