[
https://issues.apache.org/jira/browse/IMPALA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
bharath v resolved IMPALA-7659.
-------------------------------
Resolution: Fixed
Assignee: bharath v
Fix Version/s: Impala 3.2.0
> Collect count of nulls when collecting stats
> --------------------------------------------
>
> Key: IMPALA-7659
> URL: https://issues.apache.org/jira/browse/IMPALA-7659
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Frontend
> Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
> Reporter: Piotr Findeisen
> Assignee: bharath v
> Priority: Major
> Fix For: Impala 3.2.0
>
>
> When Impala calculates table stats, NULL count gets overridden with -1.
> Number of NULLs in a table is a useful information. Even if Impala does not
> benefit from this information, some other tools do. Thus, not collecting this
> information may pose a problem for Impala users (potentially forcing them to
> run COMPUTE STATS elsewhere).
> Now, counting NULLs should be an operation that is cheaper than counting
> NDVs. However, code comment in {{ComputeStatsStmt.java}} suggests otherwise
> ([~tarmstrong] suggested this is because of IMPALA-7655).
> My suggestion would be to
> - improve expression used to collect NULL count
> - collect NULL count during COMPUTE STATS
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)