[
https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918572#comment-16918572
]
ASF subversion and git services commented on IMPALA-7604:
---------------------------------------------------------
Commit a5cf105d95e9c15ce8065279f26366e021b5a432 in impala's branch
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a5cf105 ]
IMPALA-7604: part 1: tests for agg cardinality
Add tests that demonstrate bugs with overflow and
multiple aggregation classes and provide a baseline
to demonstrate the effects of the bugfixes.
Testing:
Added planner tests:
* Cardinality estimates with multiple groups (both grouping and
non-grouping aggs).
* Overflow case from multiplication of grouping expr cardinality.
* Overflow handling of addition of multiple groups.
Change-Id: I59eaddbc5be253793293af064bb2d28a425564e1
Reviewed-on: http://gerrit.cloudera.org:8080/14131
Reviewed-by: Bikramjeet Vig <[email protected]>
Tested-by: Tim Armstrong <[email protected]>
> In AggregationNode.computeStats, handle cardinality overflow better
> -------------------------------------------------------------------
>
> Key: IMPALA-7604
> URL: https://issues.apache.org/jira/browse/IMPALA-7604
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.12.0
> Reporter: Paul Rogers
> Assignee: Tim Armstrong
> Priority: Major
> Fix For: Impala 3.4.0
>
>
> Consider the cardinality overflow logic inĀ
> [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java].
> Current code:
> {noformat}
> // if we ended up with an overflow, the estimate is certain to be wrong
> if (cardinality_ < 0) cardinality_ = -1;
> {noformat}
> This code has a number of issues.
> * The check is done after looping over all conjuncts. It could be that, as a
> result, the number overflowed twice. The check should be done after each
> multiplication.
> * Since we know that the number overflowed, a better estimate of the total
> count is {{Long.MAX_VALUE}}.
> * The code later checks for the -1 value and, if found, uses the cardinality
> of the first child. This is a worse estimate than using the max value, since
> the first child might have a low cardinality (it could be the later children
> that caused the overflow.)
> * If we really do expect overflow, then we are dealing with very large
> numbers. Being accurate to the row is not needed. Better to use a {{double}}
> which can handle the large values.
> Since overflow probably seldom occurs, this is not an urgent issue. Though,
> if overflow does occur, the query is huge, and having at least some estimate
> of the hugeness is better than none. Also, seems that this code probably
> evolved; this newbie is looking at it fresh and seeing that the accumulated
> fixes could be tidied up.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]