[ https://issues.apache.org/jira/browse/PIG-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963991#comment-15963991 ]
Nandor Kollar commented on PIG-5176: ------------------------------------ On [Spark 2.1|https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L77], this will work fine. Spark 1.6 will throw an exception when you add the file with the same name, and same path, but on Spark 2.1 it is fine, you can add the file with the same name and same path, the only problem is with adding with same name and different path. In our case, these tests fail because the streaming script is implicitly shipped and we also explicitly ship it in the test Pig script. The patch I attached does this check on Pig side, and won't add the file if it is already added. > Several ComputeSpec test cases fail > ----------------------------------- > > Key: PIG-5176 > URL: https://issues.apache.org/jira/browse/PIG-5176 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: Nandor Kollar > Fix For: spark-branch > > Attachments: PIG-5176.patch > > > Several ComputeSpec test cases failed on my cluster: > ComputeSpec_5 - ComputeSpec_13 > These scripts have a ship() part in the define, where the ship includes the > script file too, so we add the same file to spark context twice. This is not > a problem with Hadoop, but looks like Spark doesn't like adding the same > filename twice: > {code} > Caused by: java.lang.IllegalArgumentException: requirement failed: File > PigStreamingDepend.pl already registered. > at scala.Predef$.require(Predef.scala:233) > at > org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:69) > at org.apache.spark.SparkContext.addFile(SparkContext.scala:1386) > at org.apache.spark.SparkContext.addFile(SparkContext.scala:1348) > at > org.apache.spark.api.java.JavaSparkContext.addFile(JavaSparkContext.scala:662) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addResourceToSparkJobWorkingDirectory(SparkLauncher.java:462) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.shipFiles(SparkLauncher.java:371) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addFilesToSparkJob(SparkLauncher.java:357) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.uploadResources(SparkLauncher.java:235) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:222) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)