[ 
https://issues.apache.org/jira/browse/PIG-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963991#comment-15963991
 ] 

Nandor Kollar commented on PIG-5176:
------------------------------------

On [Spark 
2.1|https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L77],
 this will work fine. Spark 1.6 will throw an exception when you add the file 
with the same name, and same path, but on Spark 2.1 it is fine, you can add the 
file with the same name and same path, the only problem is with adding with 
same name and different path. In our case, these tests fail because the 
streaming script is implicitly shipped and we also explicitly ship it in the 
test Pig script. The patch I attached does this check on Pig side, and won't 
add the file if it is already added.

> Several ComputeSpec test cases fail
> -----------------------------------
>
>                 Key: PIG-5176
>                 URL: https://issues.apache.org/jira/browse/PIG-5176
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5176.patch
>
>
> Several ComputeSpec test cases failed on my cluster:
> ComputeSpec_5 - ComputeSpec_13
> These scripts have a ship() part in the define, where the ship includes the 
> script file too, so we add the same file to spark context twice. This is not 
> a problem with Hadoop, but looks like Spark doesn't like adding the same 
> filename twice:
> {code}
> Caused by: java.lang.IllegalArgumentException: requirement failed: File 
> PigStreamingDepend.pl already registered.
>         at scala.Predef$.require(Predef.scala:233)
>         at 
> org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:69)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1386)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1348)
>         at 
> org.apache.spark.api.java.JavaSparkContext.addFile(JavaSparkContext.scala:662)
>         at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addResourceToSparkJobWorkingDirectory(SparkLauncher.java:462)
>         at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.shipFiles(SparkLauncher.java:371)
>         at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addFilesToSparkJob(SparkLauncher.java:357)
>         at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.uploadResources(SparkLauncher.java:235)
>         at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:222)
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to