[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable

Lefty Leverenz (JIRA) Fri, 17 Jun 2016 21:40:24 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337522#comment-15337522
 ]


Lefty Leverenz commented on HIVE-14018:
---------------------------------------

Doc note:  This adds *hive.stats.filter.in.factor* to HiveConf.java, so it will 
need to be documented for releases 2.1.1 and 2.2.0.

* [Configuration Properties -- Statistics | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics]

Added TODOC2.1.1 and TODOC2.2 labels.

> Make IN clause row selectivity estimation customizable
> ------------------------------------------------------
>
>                 Key: HIVE-14018
>                 URL: https://issues.apache.org/jira/browse/HIVE-14018
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Minor
>              Labels: TODOC2.1.1, TODOC2.2
>             Fix For: 2.2.0, 2.1.1
>
>         Attachments: HIVE-14018.1.patch, HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead 
> of just dividing incoming number of rows by 2). However, as the distribution 
> of values of the columns is considered uniform, we might end up heavily 
> underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we 
> can alleviate this problem. The solution is not very elegant, but it is the 
> best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14018) Make IN clause row selectivity estimation customizable

Reply via email to