[
https://issues.apache.org/jira/browse/SPARK-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-9235.
---------------------------------
Resolution: Incomplete
> PYSPARK_DRIVER_PYTHON env variable is not set on the YARN Node manager acting
> as driver in yarn-cluster mode
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-9235
> URL: https://issues.apache.org/jira/browse/SPARK-9235
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.4.1, 1.5.0
> Environment: CentOS 6.6, python 2.7, Spark 1.4.1 tagged version, YARN
> Cluster Manager, CDH 5.4.1 (Hadoop 2.6.0++), Java 1.7
> Reporter: Aaron Glahe
> Priority: Minor
> Labels: bulk-closed
>
> Relates to SPARK-9229
> Env: Spark on YARN, Java 1.7, Centos 6.6, CDH 5.4.1 (Hadoop 2.6.0++),
> Anaconda Python 2.7.10 "installed" in /srv/software directory
> On a client/submitting machine, we set the PYSPARK_DRIVER_PYTHON env var in
> spark-env.sh that pointed the anaconda python executable, which was on every
> YARN node:
> export PYSPARK_DRIVER_PYTHON='/srv/software/anaconda/bin/python'
> side note, export PYSPARK_PYTHON='/srv/software/anaconda/bin/python' was set
> as well in the spark-env.sh.
> run the command:
> spark-submit test.py --master yarn --deploy-mode cluster
> It appears as though the Node Manager with the DRIVER does not use the
> PYSPARK_DRIVER_PYTHON env python, but instead uses the CentOS system default
> (which in this case is python 2.6).
> Workaround appears to setting the python path in the SPARK_YARN_USER_ENV
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]