path for additional python files on LIVY batch submit

Kevin Grealish Mon, 03 Oct 2016 13:02:35 -0700

Great. Thanks for the pointer. I see the fix is in 2.0.1-rc4.

Will there be a 1.6.3? If so, how are fixes considered for backporting?


From: Steve Loughran [mailto:[email protected]]
Sent: Monday, October 3, 2016 5:40 AM
To: Kevin Grealish <[email protected]>
Cc: Apache Spark Dev <[email protected]>
Subject: Re: regression: no longer able to use HDFS wasbs:// path for 
additional python files on LIVY batch submit


On 1 Oct 2016, at 02:49, Kevin Grealish 
<[email protected]<mailto:[email protected]>> wrote:

I’m seeing a regression when submitting a batch PySpark program with additional 
files using LIVY. This is YARN cluster mode. The program files are placed into 
the mounted Azure Storage before making the call to LIVY. This is happening 
from an application which has credentials for the storage and the LIVY 
endpoint, but not local file systems on the cluster. This previously worked but 
now I’m getting the error below.

Seems this restriction was introduced with 
https://github.com/apache/spark/commit/5081a0a9d47ca31900ea4de570de2cbb0e063105<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fcommit%2F5081a0a9d47ca31900ea4de570de2cbb0e063105&data=01%7C01%7Ckevingre%40microsoft.com%7C6de8fd563cb143a4015108d3eb8a73a9%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=YiYyvdkzUMPKAHC6hPzN2kKm6vkgJWsb4a6KpkSUa18%3D&reserved=0>
 (new in 1.6.2 and 2.0.0).

How should the scenario above be achieved now? Am I missing something?

This has been fixed in 
https://issues.apache.org/jira/browse/SPARK-17512<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-17512&data=01%7C01%7Ckevingre%40microsoft.com%7C6de8fd563cb143a4015108d3eb8a73a9%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=zh7rOQL1s2ZSIdqW%2Fz0PktGPcFpMQ7HRFKETp5qIhJk%3D&reserved=0>
 ; I don't know if its in 2.0.1 though



Exception in thread "main" java.lang.IllegalArgumentException: Launching Python 
applications through spark-submit is currently only supported for local files: 
wasb://[email protected]/xxxxxxxxx/xxxxxxx.py
                at 
org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104)
                at 
org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at 
org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
                at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
                at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
                at 
org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136)
                at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:639)
                at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:637)
                at scala.Option.foreach(Option.scala:236)
                at 
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:637)
                at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
                at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.Exception: spark-submit exited with code 1}.

RE: regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

Reply via email to