[jira] [Commented] (PIG-4518) SparkOperator should correspond to complete Spark job

Mohit Sabharwal (JIRA) Thu, 23 Apr 2015 17:10:35 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510114#comment-14510114
 ]


Mohit Sabharwal commented on PIG-4518:
--------------------------------------

FYI, [~kellyzly], [~praveenr019], [~xuefuz], this patch re-works 
{{SparkPOPackageAnnotator}} so that we can have a single {{SparkOperator}} for 
every Spark job in {{SparkOperPlan}}.

> SparkOperator should correspond to complete Spark job
> -----------------------------------------------------
>
>                 Key: PIG-4518
>                 URL: https://issues.apache.org/jira/browse/PIG-4518
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4518.patch
>
>
> SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every 
> shuffle boundary (denoted by presence of POGlobalRearrange in the 
> corresponding physical plan).  This is unnecessary for Spark engine since it 
> relies on Spark to do the shuffle (using groupBy(), reduceByKey() and 
> CoGroupRDD) and does not need to explicitly identify "map" and "reduce" 
> operations.
> It is also cleaner if a single SparkOperator represents a single complete 
> Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4518) SparkOperator should correspond to complete Spark job

Reply via email to