[ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401262#comment-13401262 ]
alex gemini commented on HIVE-3086: ----------------------------------- the design is very complicated IMO,what if we have a big table logs and a small table users, table users have a column 'age', if we have issue a query skewed by age which we can't pre-partition the big table.this design didn't handle it,right? I guess what we want is customer partition at runtime,for the above example, we need customer partition(or some hint)or tell the query plan we want to partition the users table at 'userid,age' column and also partition the logs table at 'userid' column, the partition number for same userid for two table need to be same for further join. > Skewed Join Optimization > ------------------------ > > Key: HIVE-3086 > URL: https://issues.apache.org/jira/browse/HIVE-3086 > Project: Hive > Issue Type: New Feature > Reporter: Nadeem Moidu > Assignee: Nadeem Moidu > > During a join operation, if one of the columns has a skewed key, it can cause > that particular reducer to become the bottleneck. The following feature will > address it: > https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira