[ https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237359#comment-14237359 ]
Rui Li commented on HIVE-8536: ------------------------------ Hi [~xuefuz], my uncertainty is mainly about the # map join tasks, there's some comments in {{GenMRSkewJoinProcessor}} regarding it: {quote} Number of mr jobs to handle skew keys is the number of table minus 1 (we can stream the last table, so big keys in the last table will not be a problem). {quote} Honestly, I don't quite understand it. I think this makes how we handle skew join to be related to joining order. In the simplest case where we just join two tables and only one of them has skew, if we put the skewed table first, then runtime skew join will join the skewed data by a map join, otherwise it'll be just a common join. I think for now we can just stick to MR's assumptions and methods about skew join, and find if we can make any improvements later. > Enable SkewJoinResolver for spark [Spark Branch] > ------------------------------------------------ > > Key: HIVE-8536 > URL: https://issues.apache.org/jira/browse/HIVE-8536 > Project: Hive > Issue Type: Improvement > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch, > HIVE-8536.3-spark.patch > > > Sub-task of HIVE-8406 -- This message was sent by Atlassian JIRA (v6.3.4#6332)