Chao Sun created HIVE-15477:
-------------------------------

             Summary: Provide options to adjust filter stats when column stats 
are not available
                 Key: HIVE-15477
                 URL: https://issues.apache.org/jira/browse/HIVE-15477
             Project: Hive
          Issue Type: Bug
          Components: Statistics
    Affects Versions: 2.2.0
            Reporter: Chao Sun
            Assignee: Chao Sun


Currently when column stats are not available, Hive will assume the "worst" 
case by setting the # of output rows to be 1/2 of the # of input rows, for each 
predicate expression. This could be inaccurate, especially in the presence of 
multiple predicates chained by AND. We have found in some cases this could 
cause map join to have wrong ordering and thus fail with memory issue.

One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
that can be used to control the percentage of rows emitted by a predicate 
expression. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to