[
https://issues.apache.org/jira/browse/IMPALA-14594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-14594.
------------------------------------
Fix Version/s: Impala 5.0.0
Resolution: Fixed
> Grouping pre-agg's estimated input cardinality is not scaled by the number of
> threads
> -------------------------------------------------------------------------------------
>
> Key: IMPALA-14594
> URL: https://issues.apache.org/jira/browse/IMPALA-14594
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> Grouping pre-aggs decide whether to keep expanding based on whether the
> aggregation is achieving reduction. It has two measurements: Current
> reduction is (# input rows - # rows returned passthrough) / # hash table
> entries. Predicted total reduction is given by this formula (taken from
> [be/src/exec/grouping-aggregator.cc|https://github.com/apache/impala/blob/4be5fd8896dcd445a6379bdcda4bdcf318f24511/be/src/exec/grouping-aggregator.cc#L376-L380]):
> Extrapolate the current reduction factor (r) using the formula R = 1 + (N/n)
> * (r - 1), where R is the reduction factor over the full input data set, N is
> the number of input rows, excluding passed-through rows, and n is the number
> of rows inserted or merged into the hash tables. This is a very rough
> approximation but is good enough to be useful.
> The predicted total reduction formula is sensitive to the estimated input
> cardinality. A fragment instance is only going to be processing a fraction of
> the input rows, so the estimated input cardinality should be scaled down to
> the number of input rows expected to be processed by this fragment instance
> (i.e. the total value should be divided by the number of fragment instances).
> That is currently not happening, so the estimated reduction is significantly
> higher than it should be. This causes pre-aggs to expand more aggressively.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)