[ 
https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237359#comment-14237359
 ] 

Rui Li commented on HIVE-8536:
------------------------------

Hi [~xuefuz], my uncertainty is mainly about the # map join tasks, there's some 
comments in {{GenMRSkewJoinProcessor}} regarding it:
{quote}
 Number of mr jobs to handle skew keys is the number of table minus 1 (we
 can stream the last table, so big keys in the last table will not be a
 problem).
{quote}
Honestly, I don't quite understand it. I think this makes how we handle skew 
join to be related to joining order. In the simplest case where we just join 
two tables and only one of them has skew, if we put the skewed table first, 
then runtime skew join will join the skewed data by a map join, otherwise it'll 
be just a common join.
I think for now we can just stick to MR's assumptions and methods about skew 
join, and find if we can make any improvements later.

> Enable SkewJoinResolver for spark [Spark Branch]
> ------------------------------------------------
>
>                 Key: HIVE-8536
>                 URL: https://issues.apache.org/jira/browse/HIVE-8536
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch, 
> HIVE-8536.3-spark.patch
>
>
> Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to