[ 
https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427239#comment-13427239
 ] 

Namit Jain commented on HIVE-3086:
----------------------------------

@Yongqiang, the current skew join does the optimization after most of the 
damage has already been done.
The reducer detects that a particular key is skewed, and then processes that 
key in a separate MR job.

However, in this approach, we are planning to know about the skewed keys before 
hand (stored in the metastore),
and then use them to do a map-join for the skewed keys and a normal join for 
the other keys. This does require
some change from the user (the user needs to store the skewed keys in the 
metastore). However, this approach can
be very good for repetitive workloads - similar queries running every day for 
similar data. Most probably, the skew
does not change every day. The skew can be calculated periodically.
                
> Skewed Join Optimization
> ------------------------
>
>                 Key: HIVE-3086
>                 URL: https://issues.apache.org/jira/browse/HIVE-3086
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Nadeem Moidu
>            Assignee: Nadeem Moidu
>
> During a join operation, if one of the columns has a skewed key, it can cause 
> that particular reducer to become the bottleneck. The following feature will 
> address it:
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to