[ 
https://issues.apache.org/jira/browse/IMPALA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7659:
------------------------------
    Affects Version/s: Impala 3.1.0
                       Impala 3.0
                       Impala 2.12.0

> Collect count of nulls when collecting stats
> --------------------------------------------
>
>                 Key: IMPALA-7659
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7659
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>    Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>            Reporter: Piotr Findeisen
>            Assignee: bharath v
>            Priority: Major
>             Fix For: Impala 3.2.0
>
>
> When Impala calculates table stats, NULL count gets overridden with -1. 
> Number of NULLs in a table is a useful information. Even if Impala does not 
> benefit from this information, some other tools do. Thus, not collecting this 
> information may pose a problem for Impala users (potentially forcing them to 
> run COMPUTE STATS elsewhere).
> Now, counting NULLs should be an operation that is cheaper than counting 
> NDVs. However, code comment in {{ComputeStatsStmt.java}} suggests otherwise 
> ([~tarmstrong] suggested this is because of IMPALA-7655).
> My suggestion would be to
> - improve expression used to collect NULL count
> - collect NULL count during COMPUTE STATS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to