Tamas Mate created IMPALA-9699:
----------------------------------

             Summary: Skip '-1' values when aggregating num_null statistics
                 Key: IMPALA-9699
                 URL: https://issues.apache.org/jira/browse/IMPALA-9699
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
    Affects Versions: Impala 3.3.0
            Reporter: Tamas Mate
            Assignee: Tamas Mate


IMPALA-7659 added the population of NULL counts while computing stats, later 
IMPALA-8566 fixed an accuracy issue caused by the initialization of statistics. 
The initial value was changed from '-1' to '0'. The fix also contained a slight 
change on how the values are being summarized. Earlier the negative values were 
excluded from the summary:
{code:java}
if (num_new_nulls >= 0) num_nulls += num_new_nulls;
{code}
while in the new implementation, as these values should not be negative, the 
condition was removed:
{code:java}
num_nulls += num_new_nulls;
{code}
This change does not cause any problem for stats created after this fix, 
however it can make table metadata unavailable between earlier and newer 
releases. The metadata can be invalid if a compute incremental stats is issued 
on a partition because the '-1' values can decrease the column level num_nulls 
under '-1'. Later a smaller than '-1' num_null will fail on a precondition 
check when CatalogD is trying to fetch the table metadata.

The condition should not cause any problem and due to backward compatibility 
reasons we should put it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to