[ 
https://issues.apache.org/jira/browse/PIG-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606743#comment-14606743
 ] 

liyunzhang_intel commented on PIG-4594:
---------------------------------------

[~mohitsabharwal]:
{quote}
In case 3 above (multiple splitees), looks like we could use RDD.cache() to 
cache the output of b in your example.
Because, otherwise, since each Store corresponds to a Spark action, the entire 
RDD lineage will computed twice, once for each Store.
{quote}
It seems that in [PigOnSpark MileStone 
doc|https://docs.google.com/document/d/1R7O8BctJTHdMPlSy8A2imThRmhDtC2UB0HfWEsX2NGM/edit#heading=h.desnzoc5g4cs],
 
Re-design Spark Plan
Currently, the SparkLauncher converts the SparkPlan to RDD pipeline and 
immediately executes it. There is no intermediate step that allows optimization 
of the RDD pipeline, if so deemed necessary, before execution. This will need 
re-working of sparkPlanToRDD(), perhaps by introduction of a RDDPlan of 
RDDOperators. 

I think after we implement redesigning sparkPlan, we can use RDD.cache() to 
cache the output of b in the case3 to optimize.

Besides this suggestion, have you any other ideas about this patch?



> Enable "TestMultiQuery" in spark mode
> -------------------------------------
>
>                 Key: PIG-4594
>                 URL: https://issues.apache.org/jira/browse/PIG-4594
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4594.patch, PIG-4594_1.patch
>
>
> in https://builds.apache.org/job/Pig-spark/211/#showFailuresLink,it shows 
> that 
> following unit test failures fail:
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1068
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1157
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1252
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to