Hyukjin Kwon created SPARK-33782:
------------------------------------

             Summary: Place spark.files, spark.jars and spark.files under the 
current working directory on the driver in K8S
                 Key: SPARK-33782
                 URL: https://issues.apache.org/jira/browse/SPARK-33782
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.2.0
            Reporter: Hyukjin Kwon


In Yarn cluster modes, the passed files are able to be accessed in the current 
working directory. Looks like this is not the case in Kubernates cluset mode.

By doing this, users can, for example, leverage PEX to manage Python 
dependences in Apache Spark:

{code}
pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
{code}

See also https://github.com/apache/spark/pull/30735/files#r540935585.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to