Konstantin Bereznyakov created HIVE-29438:
---------------------------------------------
Summary: Statistics: inaccurate handling of unknown
numNulls/numTrues/numFalses stats
Key: HIVE-29438
URL: https://issues.apache.org/jira/browse/HIVE-29438
Project: Hive
Issue Type: Bug
Reporter: Konstantin Bereznyakov
In some implementations, it can be very expensive to maintain fully accurate
column statistics, especially for large tables. The recommended value to use
when numNulls/numFalses/numTrues are unknown is -1.
However, the statistics code has a few places where it treats "-1" as a
literal quantity rather than a sentinel value, leading to inaccurate and
sometimes catastrophic estimations (e.g., incorrect join ordering, wrong memory
allocation for operators).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)