[
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337522#comment-15337522
]
Lefty Leverenz commented on HIVE-14018:
---------------------------------------
Doc note: This adds *hive.stats.filter.in.factor* to HiveConf.java, so it will
need to be documented for releases 2.1.1 and 2.2.0.
* [Configuration Properties -- Statistics |
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics]
Added TODOC2.1.1 and TODOC2.2 labels.
> Make IN clause row selectivity estimation customizable
> ------------------------------------------------------
>
> Key: HIVE-14018
> URL: https://issues.apache.org/jira/browse/HIVE-14018
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Affects Versions: 2.1.0, 2.2.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Priority: Minor
> Labels: TODOC2.1.1, TODOC2.2
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead
> of just dividing incoming number of rows by 2). However, as the distribution
> of values of the columns is considered uniform, we might end up heavily
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we
> can alleviate this problem. The solution is not very elegant, but it is the
> best we can do until we have histograms to improve our estimate.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)