[
https://issues.apache.org/jira/browse/HIVE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuefu Zhang resolved HIVE-7958.
-------------------------------
Resolution: Won't Fix
The scenario doesn't exist any more. Thus, there is no need for this.
> SparkWork generated by SparkCompiler may require multiple Spark jobs to run
> ---------------------------------------------------------------------------
>
> Key: HIVE-7958
> URL: https://issues.apache.org/jira/browse/HIVE-7958
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Priority: Critical
> Labels: Spark-M1
> Attachments: HIVE-7958-spark.patch
>
>
> A SparkWork instance currently may contain disjointed work graphs. For
> instance, union_remove_1.q may generated a plan like this:
> {code}
> Reduce2 <- Map 1
> Reduce4 <- Map 3
> {code}
> The SparkPlan instance generated from this work graph contains two result
> RDDs. When such plan is executed, we call .foreach() on the two RDDs
> sequentially, which results two Spark jobs, one after the other.
> While this works functionally, the performance will not be great as the Spark
> jobs are run sequentially rather than concurrently.
> Another side effect of this is that the corresponding SparkPlan instance is
> over-complicated.
> The are two potential approaches:
> 1. Let SparkCompiler generate a work that can be executed in ONE Spark job
> only. In above example, two Spark task should be generated.
> 2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient
> executes them concurrently.
> Approach #1 seems more reasonable and naturally fit to our architecture.
> Also, Hive's task execution framework already takes care of the task
> concurrency.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)