[ 
https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918573#comment-16918573
 ] 

ASF subversion and git services commented on IMPALA-7604:
---------------------------------------------------------

Commit 94f7d12f87f61cef6b341a5a1141b9bbe701de0b in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=94f7d12 ]

IMPALA-7604: part 2: fixes for AggregationNode cardinality

* Use saturating arithmetic in Expr.getNumDistinctValues() to
  avoid overflows.
* Avoid double-adding with checkedAdd()
* Fix incorrect logic with multiple groups - each group cannot
  return more than the input rows, but with multiple groups
  it can add up to more than the input rows.

Testing:
Updated planner tests from part 1 to reflect bugfixes.

Added targeted cardinality tests to verify behaviour
with and without stats.

Updated other planner tests that changed as a result of
this fixed.

Ran exhaustive tests.

Change-Id: Ieed41d60c0e0dfeca64035e919cb8c28a054a9ab
Reviewed-on: http://gerrit.cloudera.org:8080/14132
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> In AggregationNode.computeStats, handle cardinality overflow better
> -------------------------------------------------------------------
>
>                 Key: IMPALA-7604
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7604
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0
>            Reporter: Paul Rogers
>            Assignee: Tim Armstrong
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> Consider the cardinality overflow logic inĀ 
> [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java].
>  Current code:
> {noformat}
>     // if we ended up with an overflow, the estimate is certain to be wrong
>     if (cardinality_ < 0) cardinality_ = -1;
> {noformat}
> This code has a number of issues.
> * The check is done after looping over all conjuncts. It could be that, as a 
> result, the number overflowed twice. The check should be done after each 
> multiplication.
> * Since we know that the number overflowed, a better estimate of the total 
> count is {{Long.MAX_VALUE}}.
> * The code later checks for the -1 value and, if found, uses the cardinality 
> of the first child. This is a worse estimate than using the max value, since 
> the first child might have a low cardinality (it could be the later children 
> that caused the overflow.)
> * If we really do expect overflow, then we are dealing with very large 
> numbers. Being accurate to the row is not needed. Better to use a {{double}} 
> which can handle the large values.
> Since overflow probably seldom occurs, this is not an urgent issue. Though, 
> if overflow does occur, the query is huge, and having at least some estimate 
> of the hugeness is better than none. Also, seems that this code probably 
> evolved; this newbie is looking at it fresh and seeing that the accumulated 
> fixes could be tidied up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to