Gopal V created HIVE-9931:
-----------------------------
Summary: Approximate nDV statistics from ORC bloom filter
population
Key: HIVE-9931
URL: https://issues.apache.org/jira/browse/HIVE-9931
Project: Hive
Issue Type: Improvement
Components: Statistics
Affects Versions: 1.2.0
Reporter: Gopal V
The current CBO implementation requires column nDV statistics to produce good
estimates of JOIN selectivity and filter selectivity.
The ORC bloom filters provides an opportunity to estimate the net population of
a row-group with false-positive rates capped for each row-group.
This is not useful for filter conditions or join conditions with a cardinality
which is a large fraction of the row-count, but can collect viable statistics
for low-cardinality filter columns (de-normalization scenarios) or for JOIN
dimension columns of low cardinality (demographics or store location).
The challenge in this feature is in distinguishing between these two scenarios,
not in the derivation of the approximate nDV itself.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)