Piotr Findeisen created IMPALA-7659:
---------------------------------------

             Summary: Collect count of nulls when collecting stats
                 Key: IMPALA-7659
                 URL: https://issues.apache.org/jira/browse/IMPALA-7659
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Piotr Findeisen


When Impala calculates table stats, NULL count gets overridden with -1. 
Number of NULLs in a table is a useful information. Even if Impala does not 
benefit from this information, some other tools do. Thus, not collecting this 
information may pose a problem for Impala users (potentially forcing them to 
run COMPUTE STATS elsewhere).

Now, counting NULLs should be an operation that is cheaper than counting NDVs. 
However, code comment in {{ComputeStatsStmt.java}} suggests otherwise 
([~tarmstrong] suggested this is because of IMPALA-7655).

My suggestion would be to
- improve expression used to collect NULL count
- collect NULL count during COMPUTE STATS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to