[
https://issues.apache.org/jira/browse/IMPALA-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099788#comment-17099788
]
ASF subversion and git services commented on IMPALA-9699:
---------------------------------------------------------
Commit 56c20ac4639cab3d6e5e0ae77ce395934a9294f6 in impala's branch
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=56c20ac ]
IMPALA-9699: Skip '-1' values when aggregating num_null statistics
This change partially reverts IMPALA-8566 to make Impala backward
compatible with the old incremental partition stats. IMPALA-7659 added
collecting the number of null value statistics and IMPALA-8566 changed
the initial value of the incremental partition statistics from '-1' to
'0', because with '-1' the estimates were off by 1.
The old statistics in a new release can make the table metadata
inaccessible when the column stats are recomputed from the incremental
partition stats, which can be triggered by a partition level
'COMPUTE INCREMENTAL STATS'. In this case the old '-1' values can be
aggregated to a '<-1' 'num_nulls' value that later can fail a
Precondition check during table load.
The new behavior ensures that if any incremental partition stat has a
value of '-1' for 'num_nulls', the aggregated stats will be '-1',
regardless of whether or not other partitions have valid values for
'num_nulls'. This will prevent the planner from utilizing incomplete
statistics and the users will be notified about the missing statistics
with the general warning in the profile:
'The following tables are missing relevant table and/or column
statistics.'
Testing:
- Added unit test to verify the accepted values and aggregation result
Change-Id: I3fdf48a6c88378145078e068e12ade48c460f956
Reviewed-on: http://gerrit.cloudera.org:8080/15835
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Skip '-1' values when aggregating num_null incremental statistics
> -----------------------------------------------------------------
>
> Key: IMPALA-9699
> URL: https://issues.apache.org/jira/browse/IMPALA-9699
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 3.3.0
> Reporter: Tamas Mate
> Assignee: Tamas Mate
> Priority: Major
> Labels: backwards-compatibility
>
> IMPALA-7659 added the population of NULL counts while computing stats, later
> IMPALA-8566 fixed an accuracy issue caused by the initialization of
> statistics. The initial value was changed from '-1' to '0'. The fix also
> contained a slight change on how the values are being summarized. Earlier the
> negative values were excluded from the summary:
> {code:java}
> if (num_new_nulls >= 0) num_nulls += num_new_nulls;
> {code}
> while in the new implementation, as these values should not be negative, the
> condition was removed:
> {code:java}
> num_nulls += num_new_nulls;
> {code}
> This change does not cause any problem for stats created after this fix,
> however it can make table metadata unavailable between earlier and newer
> releases. The metadata can be invalid if a compute incremental stats is
> issued on a partition because the '-1' values can decrease the column level
> num_nulls under '-1'. Later a smaller than '-1' num_null will fail on a
> precondition check when CatalogD is trying to fetch the table metadata.
> The condition should not cause any problem and due to backward compatibility
> reasons we should put it back.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]