Thomas Rebele created HIVE-29300:
------------------------------------

             Summary: Wrong estimation for num rows in EXPLAIN with histogram 
statistics
                 Key: HIVE-29300
                 URL: https://issues.apache.org/jira/browse/HIVE-29300
             Project: Hive
          Issue Type: Bug
    Affects Versions: 4.1.0
            Reporter: Thomas Rebele
         Attachments: stats_histogram2.q, stats_histogram2.q.out

Given a query {{{}SELECT 1 FROM sh2a WHERE k1 < 10 AND k2 < 250{}}}, with a 
selectivity of 0.02 for {{k1 < 10}} and 0.5 for {{{}k2 < 250{}}}, the combined 
selectivity should be 0.01 resulting in selecting 5 rows of the 500 rows of the 
table.

If we activate histograms, the combined selectivity is estimated only with the 
last clause as approx 0.5, so estimating that 247 rows of 500 rows are selected.

Reason: 
{{StatsRulesProcFactory.FilterStatsRule#evaluateComparatorWithHistogram}} needs 
to estimate the selectivity of the condition and multiply it by currNumRows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to