Willi Raschkowski created SPARK-39659:
-----------------------------------------
Summary: Add environment bin folder to R/Python subprocess PATH
Key: SPARK-39659
URL: https://issues.apache.org/jira/browse/SPARK-39659
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 3.3.0
Reporter: Willi Raschkowski
Some Python packages rely on non-Python executables which are usually made
available on the {{PATH}} through something like {{{}conda activate{}}}.
When using Spark with conda-pack environments added via {{{}spark.archives{}}},
Python packages aren't able to find conda-installed executables because Spark
doesn't update {{{}PATH{}}}.
E.g.
{code:java|title=test.py}
# This only works if kaleido-python can find the conda-installed executable
fig = px.scatter(px.data.iris(), x="sepal_length", y="sepal_width",
color="species")
fig.write_image("figure.png", engine="kaleido")
{code}
and
{code:java}
./bin/spark-submit --master yarn --deploy-mode cluster --archives
environment.tar.gz#environment --conf
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python test.py
{code}
will throw
{code:java}
Traceback (most recent call last):
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/kaleido-test.py",
line 7, in <module>
fig.write_image("figure.png", engine="kaleido")
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/basedatatypes.py",
line 3829, in write_image
return pio.write_image(self, *args, **kwargs)
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py",
line 267, in write_image
img_data = to_image(
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py",
line 144, in to_image
img_bytes = scope.transform(
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/plotly.py",
line 153, in transform
response = self._perform_transform(
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
line 293, in _perform_transform
self._ensure_kaleido()
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
line 176, in _ensure_kaleido
proc_args = self._build_proc_args()
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
line 123, in _build_proc_args
proc_args = [self.executable_path(), self.scope_name]
File
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
line 99, in executable_path
raise ValueError(
ValueError:
The kaleido executable is required by the kaleido Python library, but it was
not included
in the Python package and it could not be found on the system PATH.
Searched for included kaleido executable at:
/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/executable/kaleido
Searched for executable 'kaleido' on the following system PATH:
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]