Santosh Pingale created SPARK-42170:
---------------------------------------
Summary: Files added to the spark-submit command with master K8s
and deploy mode cluster, end up in a non deterministic location inside the
driver.
Key: SPARK-42170
URL: https://issues.apache.org/jira/browse/SPARK-42170
Project: Spark
Issue Type: Bug
Components: Kubernetes, Spark Submit
Affects Versions: 3.2.2, 3.3.0
Reporter: Santosh Pingale
Files added to the spark-submit command with master K8s and deploy mode
cluster, end up in a non deterministic location inside the driver.
eg:
{{spark-submit --files myfile --master k8s.. --deploy-mode cluster` will upload
the files to /tmp/spark-uuid/myfile}}
The issue happens because
[Utils.createTempDir()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L344]
creates a directory with a uuid in the directory name. This issue does not
affect the `--archives` option, because we `unarchive` the archives into the
destination directory which is relative to the working dir. This bug affects
file access pre & post app creation. For example if we distribute python
dependencies with pex, we need to use `--files` to attach the pex file and
change the spark.pyspark.python to point to this file. But the file location
can not be determined before submitting the app. On the other hand, after the
app is created, referencing the files without using `SparkFiles.get` also does
not work
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]