[
https://issues.apache.org/jira/browse/PIG-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510114#comment-14510114
]
Mohit Sabharwal commented on PIG-4518:
--------------------------------------
FYI, [~kellyzly], [~praveenr019], [~xuefuz], this patch re-works
{{SparkPOPackageAnnotator}} so that we can have a single {{SparkOperator}} for
every Spark job in {{SparkOperPlan}}.
> SparkOperator should correspond to complete Spark job
> -----------------------------------------------------
>
> Key: PIG-4518
> URL: https://issues.apache.org/jira/browse/PIG-4518
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4518.patch
>
>
> SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every
> shuffle boundary (denoted by presence of POGlobalRearrange in the
> corresponding physical plan). This is unnecessary for Spark engine since it
> relies on Spark to do the shuffle (using groupBy(), reduceByKey() and
> CoGroupRDD) and does not need to explicitly identify "map" and "reduce"
> operations.
> It is also cleaner if a single SparkOperator represents a single complete
> Spark job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)