Paul Rogers created IMPALA-7604:
-----------------------------------

             Summary: In AggregationNode.computeStats, handle cardinality 
overflow better
                 Key: IMPALA-7604
                 URL: https://issues.apache.org/jira/browse/IMPALA-7604
             Project: IMPALA
          Issue Type: Improvement
    Affects Versions: Impala 2.12.0
            Reporter: Paul Rogers


Consider the cardinality overflow logic inĀ 
[{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java].
 Current code:

{noformat}
    // if we ended up with an overflow, the estimate is certain to be wrong
    if (cardinality_ < 0) cardinality_ = -1;
{noformat}

This code has a number of issues.

* The check is done after looping over all conjuncts. It could be that, as a 
result, the number overflowed twice. The check should be done after each 
multiplication.
* Since we know that the number overflowed, a better estimate of the total 
count is {{Long.MAX_VALUE}}.
* The code later checks for the -1 value and, if found, uses the cardinality of 
the first child. This is a worse estimate than using the max value, since the 
first child might have a low cardinality (it could be the later children that 
caused the overflow.)
* If we really do expect overflow, then we are dealing with very large numbers. 
Being accurate to the row is not needed. Better to use a {{double}} which can 
handle the large values.

Since overflow probably seldom occurs, this is not an urgent issue. Though, if 
overflow does occur, the query is huge, and having at least some estimate of 
the hugeness is better than none. Also, seems that this code probably evolved; 
this newbie is looking at it fresh and seeing that the accumulated fixes could 
be tidied up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to