Konstantin Bereznyakov created HIVE-29534:
---------------------------------------------
Summary: Statistics: StatsUtils::getColStatistics does not set
NDV/some other fields for DATE?TIMESTAMP columns
Key: HIVE-29534
URL: https://issues.apache.org/jira/browse/HIVE-29534
Project: Hive
Issue Type: Bug
Reporter: Konstantin Bereznyakov
Technically, the method is missing stats for multiple data types. The most
important ones seem to be: setCountDistint() for DATE_TYPE_NAME and
TIMESTAMP_TYPE_NAME
The TIMESTAMP datatype could also benefit from setBitVectors(), for which the
info also appears to be available.
As the result of this, the NDV of columns of this data type is assigned a value
of 0. which could negatively impact execution planning of some queries
[https://github.com/apache/hive/blob/bbd83dff5bfc8b8ce018476391469da3331216dd/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L840]
[https://github.com/apache/hive/blob/bbd83dff5bfc8b8ce018476391469da3331216dd/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L870]
Adding this info seems to change the output of about 100 .out files
--
This message was sent by Atlassian Jira
(v8.20.10#820010)