Maxime Nannan created SPARK-25978:
-------------------------------------
Summary: Pyspark can only be used in spark-submit in spark-py
docker image for kubernetes
Key: SPARK-25978
URL: https://issues.apache.org/jira/browse/SPARK-25978
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Maxime Nannan
Currently in spark-py docker image for kubernetes defined by the Dockerfile in
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile,
the PYTHONPATH is defined as follows:
{code:java}
ENV PYTHONPATH
${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4j-*.zip{code}
I think that the problem is that PYTHONPATH does not support wildcards so py4j
cannot be imported with the default PYTHONPATH and pyspark cannot be imported
too as it needs py4j.
This does not impact spark-submit of python files because py4j is dynamically
added to PYTHONPATH when running python process in
core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala.
It's not really an issue as the main purpose of that docker image is to be run
as driver or executors on k8s but it's worth mentionning this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]