[
https://issues.apache.org/jira/browse/PIG-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mohit Sabharwal updated PIG-4518:
---------------------------------
Attachment: PIG-4518.patch
> SparkOperator should correspond to complete Spark job
> -----------------------------------------------------
>
> Key: PIG-4518
> URL: https://issues.apache.org/jira/browse/PIG-4518
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4518.patch
>
>
> SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every
> shuffle boundary (denoted by presence of POGlobalRearrange in the
> corresponding physical plan). This is unnecessary for Spark engine since it
> relies on Spark to do the shuffle (using groupBy(), reduceByKey() and
> CoGroupRDD) and does not need to explicitly identify "map" and "reduce"
> operations.
> It is also cleaner if a single SparkOperator represents a single complete
> Spark job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)