[
https://issues.apache.org/jira/browse/PIG-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976601#comment-14976601
]
ASF GitHub Bot commented on PIG-4675:
-------------------------------------
GitHub user linyunfeng opened a pull request:
https://github.com/apache/pig/pull/22
PIG-4675: Multi Store Statement will fail on the second store statement.
This might not be the ultimate solution for the MultiQueryOptimizer but it
solve the issue for Join error.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/linyunfeng/pig PIG-4675
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/pig/pull/22.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22
----
commit 9d55a8b1c9bfcb8a77168dba895b7c95be332b07
Author: Peter Lin <[email protected]>
Date: 2015-10-27T15:29:01Z
PIG-4675: Multi Store Statement will fail on the second store statement.
----
> Multi Store Statement will fail on the second store statement.
> --------------------------------------------------------------
>
> Key: PIG-4675
> URL: https://issues.apache.org/jira/browse/PIG-4675
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Peter Lin
> Assignee: Peter Lin
> Fix For: spark-branch
>
> Attachments: name.txt, ssn.txt, test.pig
>
>
> We are testing the spark branch pig recently with mapr3 and spark 1.5. It
> turns out if we use more than 1 store command in the pig script will have
> exception from the second store command.
> SSN = load '/test/ssn.txt' using PigStorage() as (ssn:long);
> SSN_NAME = load '/test/name.txt' using PigStorage() as (ssn:long,
> name:chararray);
> X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn USING 'replicated';
> R1 = limit SSN_NAME 10;
> store R1 into '/tmp/test1_r1';
> store X into '/tmp/test1_x';
> Exception Details:
> 15/09/11 13:37:00 INFO storage.MemoryStore: ensureFreeSpace(114448) called
> with curMem=359237, maxMem=503379394
> 15/09/11 13:37:00 INFO storage.MemoryStore: Block broadcast_2 stored as
> values in memory (estimated size 111.8 KB, free 479.6 MB)
> 15/09/11 13:37:00 INFO storage.MemoryStore: ensureFreeSpace(32569) called
> with curMem=473685, maxMem=503379394
> 15/09/11 13:37:00 INFO storage.MemoryStore: Block broadcast_2_piece0 stored
> as bytes in memory (estimated size 31.8 KB, free 479.6 MB)
> 15/09/11 13:37:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in
> memory on 10.51.2.82:55960 (size: 31.8 KB, free: 479.9 MB)
> 15/09/11 13:37:00 INFO spark.SparkContext: Created broadcast 2 from
> newAPIHadoopRDD at LoadConverter.java:88
> 15/09/11 13:37:00 WARN util.ClosureCleaner: Expected a closure; got
> org.apache.pig.backend.hadoop.executionengine.spark.converter.LoadConverter$ToTupleFunction
> 15/09/11 13:37:00 INFO spark.SparkLauncher: Converting operator POForEach
> (Name: SSN: New For Each(false)[bag] - scope-17 Operator Key: scope-17)
> 15/09/11 13:37:00 INFO spark.SparkLauncher: Converting operator POFRJoin
> (Name: X: FRJoin[tuple] - scope-22 Operator Key: scope-22)
> 15/09/11 13:37:00 ERROR spark.SparkLauncher: throw exception in
> sparkOperToRDD:
> java.lang.RuntimeException: Should have greater than1 predecessors for class
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.
> Got : 1
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkUtil.assertPredecessorSizeGreaterThan(SparkUtil.java:93)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.converter.FRJoinConverter.convert(FRJoinConverter.java:55)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.converter.FRJoinConverter.convert(FRJoinConverter.java:46)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:633)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:600)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.physicalToRDD(SparkLauncher.java:621)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkOperToRDD(SparkLauncher.java:552)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.sparkPlanToRDD(SparkLauncher.java:501)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:204)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:301)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
> at org.apache.pig.PigServer.execute(PigServer.java:1364)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
> at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
> at org.apache.pig.Main.run(Main.java:624)
> at org.apache.pig.Main.main(Main.java:170)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)