[ 
https://issues.apache.org/jira/browse/HIVE-29300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038917#comment-18038917
 ] 

Denys Kuzmenko commented on HIVE-29300:
---------------------------------------

Merged to master and cherry-picked to branch-4.2

Thanks for the fix, [~thomas.rebele] !

> Wrong estimation for num rows in EXPLAIN with histogram statistics
> ------------------------------------------------------------------
>
>                 Key: HIVE-29300
>                 URL: https://issues.apache.org/jira/browse/HIVE-29300
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 4.1.0
>            Reporter: Thomas Rebele
>            Assignee: Thomas Rebele
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: stats_histogram2.q, stats_histogram2.q.out
>
>
> Given a query {{{}SELECT 1 FROM sh2a WHERE k1 < 10 AND k2 < 250{}}}, with a 
> selectivity of 0.02 for {{k1 < 10}} and 0.5 for {{{}k2 < 250{}}}, the 
> combined selectivity should be 0.01 resulting in selecting 5 rows of the 500 
> rows of the table.
> If we activate histograms, the combined selectivity is estimated only with 
> the last clause as approx 0.5, so estimating that 247 rows of 500 rows are 
> selected.
> Reason: 
> {{StatsRulesProcFactory.FilterStatsRule#evaluateComparatorWithHistogram}} 
> needs to estimate the selectivity of the condition and multiply it by 
> currNumRows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to