[
https://issues.apache.org/jira/browse/HIVE-29300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038917#comment-18038917
]
Denys Kuzmenko commented on HIVE-29300:
---------------------------------------
Merged to master and cherry-picked to branch-4.2
Thanks for the fix, [~thomas.rebele] !
> Wrong estimation for num rows in EXPLAIN with histogram statistics
> ------------------------------------------------------------------
>
> Key: HIVE-29300
> URL: https://issues.apache.org/jira/browse/HIVE-29300
> Project: Hive
> Issue Type: Bug
> Components: Statistics
> Affects Versions: 4.1.0
> Reporter: Thomas Rebele
> Assignee: Thomas Rebele
> Priority: Minor
> Labels: pull-request-available
> Attachments: stats_histogram2.q, stats_histogram2.q.out
>
>
> Given a query {{{}SELECT 1 FROM sh2a WHERE k1 < 10 AND k2 < 250{}}}, with a
> selectivity of 0.02 for {{k1 < 10}} and 0.5 for {{{}k2 < 250{}}}, the
> combined selectivity should be 0.01 resulting in selecting 5 rows of the 500
> rows of the table.
> If we activate histograms, the combined selectivity is estimated only with
> the last clause as approx 0.5, so estimating that 247 rows of 500 rows are
> selected.
> Reason:
> {{StatsRulesProcFactory.FilterStatsRule#evaluateComparatorWithHistogram}}
> needs to estimate the selectivity of the condition and multiply it by
> currNumRows.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)