Thomas Rebele created HIVE-29300:
------------------------------------
Summary: Wrong estimation for num rows in EXPLAIN with histogram
statistics
Key: HIVE-29300
URL: https://issues.apache.org/jira/browse/HIVE-29300
Project: Hive
Issue Type: Bug
Affects Versions: 4.1.0
Reporter: Thomas Rebele
Attachments: stats_histogram2.q, stats_histogram2.q.out
Given a query {{{}SELECT 1 FROM sh2a WHERE k1 < 10 AND k2 < 250{}}}, with a
selectivity of 0.02 for {{k1 < 10}} and 0.5 for {{{}k2 < 250{}}}, the combined
selectivity should be 0.01 resulting in selecting 5 rows of the 500 rows of the
table.
If we activate histograms, the combined selectivity is estimated only with the
last clause as approx 0.5, so estimating that 247 rows of 500 rows are selected.
Reason:
{{StatsRulesProcFactory.FilterStatsRule#evaluateComparatorWithHistogram}} needs
to estimate the selectivity of the condition and multiply it by currNumRows.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)