[ https://issues.apache.org/jira/browse/PIG-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-5413: ------------------------------ Attachment: pig-5413-v01.patch This issue will be fixed when "PIG-5241: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java" is fixed since the underlying issue here is SparkLauncher.cacheFiles is creating a unique tmp file for every call preventing Spark/Hadoop layer to be able to skip the redundant paths. I took a quick look on PIG-5241 but couldn't figure out how Spark uses Hadoop's distributed cache especially with "#" symlinks. For now, I'm adding another layer of hack over the existing hack to avoid registering same files more than once (when multiple jobs are submitted). > [spark] TestStreaming.testInputCacheSpecs failing with "File script1.pl was > already registered" > ----------------------------------------------------------------------------------------------- > > Key: PIG-5413 > URL: https://issues.apache.org/jira/browse/PIG-5413 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Minor > Attachments: pig-5413-v01.patch > > > {noformat} > Caused by: java.lang.IllegalArgumentException: requirement failed: File > script1.pl was already registered with a different path (old path = > /tmp/yarn-local/usercache/knoguchi/appcache/application_1628754354801_523406/container_e07_1628754354801_523406_01_000061/tmp/pig_junit_tmp1798933174/cache7028476439694979845/script1.pl, > new path = > /tmp/yarn-local/usercache/knoguchi/appcache/application_1628754354801_523406/container_e07_1628754354801_523406_01_000061/tmp/pig_junit_tmp1798933174/cache4167672945345635171/script1.pl > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:70) > at org.apache.spark.SparkContext.addFile(SparkContext.scala:1559) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)