[
https://issues.apache.org/jira/browse/PIG-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510383#comment-14510383
]
liyunzhang_intel commented on PIG-4518:
---------------------------------------
[~mohitsabharwal]:
leave some comments on the review board.
It is ok not to create a new spark operator when POGlobalRearrange is
encountered if no new unit test failures are added.
> SparkOperator should correspond to complete Spark job
> -----------------------------------------------------
>
> Key: PIG-4518
> URL: https://issues.apache.org/jira/browse/PIG-4518
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4518.patch
>
>
> SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every
> shuffle boundary (denoted by presence of POGlobalRearrange in the
> corresponding physical plan). This is unnecessary for Spark engine since it
> relies on Spark to do the shuffle (using groupBy(), reduceByKey() and
> CoGroupRDD) and does not need to explicitly identify "map" and "reduce"
> operations.
> It is also cleaner if a single SparkOperator represents a single complete
> Spark job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)