[ 
https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224122#comment-14224122
 ] 

Rui Li commented on HIVE-8536:
------------------------------

Hi [~szehon],

{{skewjoin_noskew}} failed to run because our root stage has two children:
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-4 depends on stages: Stage-2, Stage-0
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-1
{noformat}
where Stage-2 is a {{DependencyCollectionTask}}.
There's a check during runtime skew join which throws the exception:
{code}
    List<Task<? extends Serializable>> children = currTask.getChildTasks();
    if (children != null && children.size() > 1) {
      throw new SemanticException("Should not happened");
    }
{code}
This single child is later used to create {{ConditionalResolverSkewJoinCtx}}.
I think MR assumes each task has only one child task, which doesn't hold for 
spark.
Shall we disable runtime skew join in this case, or extend the optimizer to 
support multiple children? And btw, why do we need this 
{{DependencyCollectionTask}}?

> Enable SkewJoinResolver for spark [Spark Branch]
> ------------------------------------------------
>
>                 Key: HIVE-8536
>                 URL: https://issues.apache.org/jira/browse/HIVE-8536
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch
>
>
> Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to