[jira] [Comment Edited] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

Pratik Malani (Jira) Tue, 11 Jul 2023 06:34:03 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742006#comment-17742006
 ]


Pratik Malani edited comment on SPARK-33782 at 7/11/23 1:33 PM:
----------------------------------------------------------------

Hi [~pralabhkumar] 

The latest update in the SparkSubmit.scala is causing the FileNotFoundException.
The below mentioned jar is present at the said location /opt/spark/work-dir/, 
but the Files.copy statement in the SparkSubmit.scala is causing the issue.
Can you please help to check what could be possible cause?
{code:java}
Files  local:///opt/spark/work-dir/sample.jar from 
/opt/spark/work-dir/sample.jar to /opt/spark/work-dir/./sample.jar
Exception in thread "main" java.nio.file.NoSuchFileException: 
/opt/spark/work-dir/sample.jar
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
        at 
sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
        at java.nio.file.Files.copy(Files.java:1274)
        at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$14(SparkSubmit.scala:437)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.deploy.SparkSubmit.downloadResourcesToCurrentDirectory$1(SparkSubmit.scala:424)
        at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$17(SparkSubmit.scala:449)
        at scala.Option.map(Option.scala:230)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:449)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
 {code}


was (Author: JIRAUSER296450):
Hi [~pralabhkumar] 

The latest update in the SparkSubmit.scala is causing the FileNotFoundException.
The below mentioned jar is present at the said location, but the Files.copy 
statement in the SparkSubmit.scala is causing the issue.
Can you please help to check what could be possible cause?
{code:java}
Files  local:///opt/spark/work-dir/database-scripts-1.1-SNAPSHOT.jar from 
/opt/spark/work-dir/database-scripts-1.1-SNAPSHOT.jar to 
/opt/spark/work-dir/./database-scripts-1.1-SNAPSHOT.jar
Exception in thread "main" java.nio.file.NoSuchFileException: 
/opt/spark/work-dir/database-scripts-1.1-SNAPSHOT.jar
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
        at 
sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
        at java.nio.file.Files.copy(Files.java:1274)
        at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$14(SparkSubmit.scala:437)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.deploy.SparkSubmit.downloadResourcesToCurrentDirectory$1(SparkSubmit.scala:424)
        at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$17(SparkSubmit.scala:449)
        at scala.Option.map(Option.scala:230)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:449)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
 {code}

> Place spark.files, spark.jars and spark.files under the current working 
> directory on the driver in K8S cluster mode
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-33782
>                 URL: https://issues.apache.org/jira/browse/SPARK-33782
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Hyukjin Kwon
>            Assignee: Pralabh Kumar
>            Priority: Major
>             Fix For: 3.4.0
>
>
> In Yarn cluster modes, the passed files are able to be accessed in the 
> current working directory. Looks like this is not the case in Kubernates 
> cluset mode.
> By doing this, users can, for example, leverage PEX to manage Python 
> dependences in Apache Spark:
> {code}
> pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
> PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
> {code}
> See also https://github.com/apache/spark/pull/30735/files#r540935585.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

Reply via email to