[ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768290#comment-15768290
 ] 

Prasanth Jayachandran commented on HIVE-15477:
----------------------------------------------

Do you want to fix the inequality case as well? that divides by 3 in worst 
case. 
Other than that the patch looks good to me, +1

> Provide options to adjust filter stats when column stats are not available
> --------------------------------------------------------------------------
>
>                 Key: HIVE-15477
>                 URL: https://issues.apache.org/jira/browse/HIVE-15477
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 2.2.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" 
> case by setting the # of output rows to be 1/2 of the # of input rows, for 
> each predicate expression. This could be inaccurate, especially in the 
> presence of multiple predicates chained by AND. We have found in some cases 
> this could cause map join to have wrong ordering and thus fail with memory 
> issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
> that can be used to control the percentage of rows emitted by a predicate 
> expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to