Piotr Findeisen created IMPALA-7659:
---------------------------------------
Summary: Collect count of nulls when collecting stats
Key: IMPALA-7659
URL: https://issues.apache.org/jira/browse/IMPALA-7659
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Piotr Findeisen
When Impala calculates table stats, NULL count gets overridden with -1.
Number of NULLs in a table is a useful information. Even if Impala does not
benefit from this information, some other tools do. Thus, not collecting this
information may pose a problem for Impala users (potentially forcing them to
run COMPUTE STATS elsewhere).
Now, counting NULLs should be an operation that is cheaper than counting NDVs.
However, code comment in {{ComputeStatsStmt.java}} suggests otherwise
([~tarmstrong] suggested this is because of IMPALA-7655).
My suggestion would be to
- improve expression used to collect NULL count
- collect NULL count during COMPUTE STATS
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]