Stamatis Zampetakis created HIVE-24252:

             Summary: Improve decision model for using semijoin reducers
                 Key: HIVE-24252
             Project: Hive
          Issue Type: Improvement
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis

After a few experiments with TPC-DS 10TB dataset, we observed that in some 
cases semijoin reducers were not effective; they didn't reduce the number of 
records or they reduced the relation only a tiny bit. 

In some cases we can make the semijoin reducer more effective by adding more 
columns but this requires also a bigger bloom filter so the decision for the 
number of columns to include in the bloom becomes more delicate.

The current decision model always chooses multi-column semijoin reducers if 
they are available but this may not always beneficial if the a single column 
can reduce significantly the target relation.

This message was sent by Atlassian Jira

Reply via email to