[
https://issues.apache.org/jira/browse/IMPALA-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844274#comment-16844274
]
Todd Lipcon commented on IMPALA-8566:
-------------------------------------
The issue is the intialization of PerColumnStats here:
{code:java}
PerColumnStats()
: intermediate_ndv(AggregateFunctions::HLL_LEN, 0), num_nulls(-1),
max_width(0), num_rows(0), avg_width(0) { }
{code}
Initializing num_nulls to {{-1}} means we end up off by one in the end result.
> COMPUTE INCREMENTAL STATS sets num_nulls off-by-one
> ---------------------------------------------------
>
> Key: IMPALA-8566
> URL: https://issues.apache.org/jira/browse/IMPALA-8566
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.2.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
>
> IMPALA-7659 added the population of NULL counts while computing stats, but
> this functionality isn't working properly for incremental stats. The query is
> produced correctly, but the null count set in the table is one lower than it
> should be. In the case that the table has no nulls, this ends up setting a
> '-1' count, which is interpreted as 'unknown'. In the case that there are
> nulls, we'll just be a little inaccurate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]