[ 
https://issues.apache.org/jira/browse/IMPALA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099790#comment-17099790
 ] 

ASF subversion and git services commented on IMPALA-7659:
---------------------------------------------------------

Commit 56c20ac4639cab3d6e5e0ae77ce395934a9294f6 in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=56c20ac ]

IMPALA-9699: Skip '-1' values when aggregating num_null statistics

This change partially reverts IMPALA-8566 to make Impala backward
compatible with the old incremental partition stats. IMPALA-7659 added
collecting the number of null value statistics and IMPALA-8566 changed
the initial value of the incremental partition statistics from '-1' to
'0', because with '-1' the estimates were off by 1.

The old statistics in a new release can make the table metadata
inaccessible when the column stats are recomputed from the incremental
partition stats, which can be triggered by a partition level
'COMPUTE INCREMENTAL STATS'. In this case the old '-1' values can be
aggregated to a '<-1' 'num_nulls' value that later can fail a
Precondition check during table load.

The new behavior ensures that if any incremental partition stat has a
value of '-1' for 'num_nulls', the aggregated stats will be '-1',
regardless of whether or not other partitions have valid values for
'num_nulls'. This will prevent the planner from utilizing incomplete
statistics and the users will be notified about the missing statistics
with the general warning in the profile:
 'The following tables are missing relevant table and/or column
  statistics.'

Testing:
 - Added unit test to verify the accepted values and aggregation result

Change-Id: I3fdf48a6c88378145078e068e12ade48c460f956
Reviewed-on: http://gerrit.cloudera.org:8080/15835
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Collect count of nulls when collecting stats
> --------------------------------------------
>
>                 Key: IMPALA-7659
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7659
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>    Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>            Reporter: Piotr Findeisen
>            Assignee: Bharath Vissapragada
>            Priority: Major
>             Fix For: Impala 3.2.0
>
>
> When Impala calculates table stats, NULL count gets overridden with -1. 
> Number of NULLs in a table is a useful information. Even if Impala does not 
> benefit from this information, some other tools do. Thus, not collecting this 
> information may pose a problem for Impala users (potentially forcing them to 
> run COMPUTE STATS elsewhere).
> Now, counting NULLs should be an operation that is cheaper than counting 
> NDVs. However, code comment in {{ComputeStatsStmt.java}} suggests otherwise 
> ([~tarmstrong] suggested this is because of IMPALA-7655).
> My suggestion would be to
> - improve expression used to collect NULL count
> - collect NULL count during COMPUTE STATS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to