Willi Raschkowski created SPARK-39659:
-----------------------------------------

             Summary: Add environment bin folder to R/Python subprocess PATH
                 Key: SPARK-39659
                 URL: https://issues.apache.org/jira/browse/SPARK-39659
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.3.0
            Reporter: Willi Raschkowski


Some Python packages rely on non-Python executables which are usually made 
available on the {{PATH}} through something like {{{}conda activate{}}}.

When using Spark with conda-pack environments added via {{{}spark.archives{}}}, 
Python packages aren't able to find conda-installed executables because Spark 
doesn't update {{{}PATH{}}}.

E.g.
{code:java|title=test.py}
# This only works if kaleido-python can find the conda-installed executable
fig = px.scatter(px.data.iris(), x="sepal_length", y="sepal_width", 
color="species")
fig.write_image("figure.png", engine="kaleido")
{code}
and
{code:java}
./bin/spark-submit --master yarn --deploy-mode cluster --archives 
environment.tar.gz#environment --conf 
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python test.py
{code}
will throw
{code:java}
Traceback (most recent call last):
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/kaleido-test.py",
 line 7, in <module>
    fig.write_image("figure.png", engine="kaleido")
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/basedatatypes.py",
 line 3829, in write_image
    return pio.write_image(self, *args, **kwargs)
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py",
 line 267, in write_image
    img_data = to_image(
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/plotly/io/_kaleido.py",
 line 144, in to_image
    img_bytes = scope.transform(
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/plotly.py",
 line 153, in transform
    response = self._perform_transform(
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
 line 293, in _perform_transform
    self._ensure_kaleido()
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
 line 176, in _ensure_kaleido
    proc_args = self._build_proc_args()
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
 line 123, in _build_proc_args
    proc_args = [self.executable_path(), self.scope_name]
  File 
"/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/scopes/base.py",
 line 99, in executable_path
    raise ValueError(
ValueError: 
The kaleido executable is required by the kaleido Python library, but it was 
not included
in the Python package and it could not be found on the system PATH.

Searched for included kaleido executable at:
    
/tmp/hadoop-hadoop/nm-local-dir/usercache/wraschkowski/appcache/application_1656456739406_0012/container_1656456739406_0012_01_000001/environment/lib/python3.10/site-packages/kaleido/executable/kaleido
 

Searched for executable 'kaleido' on the following system PATH:
    /usr/local/sbin
    /usr/local/bin
    /usr/sbin
    /usr/bin
    /sbin
    /bin
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to