[
https://issues.apache.org/jira/browse/SPARK-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Kimbrel updated SPARK-9229:
--------------------------------
Environment: centos Cloudera 5.4.1 based off Apache Hadoop 2.6.0, using
spark 1.5.0 built for hadoop 2.6.0 from github master branch on 7.20.2015
(was: centos )
> pyspark yarn-cluster PYSPARK_PYTHON not set
> --------------------------------------------
>
> Key: SPARK-9229
> URL: https://issues.apache.org/jira/browse/SPARK-9229
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.5.0
> Environment: centos Cloudera 5.4.1 based off Apache Hadoop 2.6.0,
> using spark 1.5.0 built for hadoop 2.6.0 from github master branch on
> 7.20.2015
> Reporter: Eric Kimbrel
>
> PYSPARK_PYTHON is set in spark-env.sh to use an alternative python
> installation.
> Use spark-submit to run a pyspark job in yarn with cluster deploy mode.
> PYSPARK_PTYHON is not set in the cluster environment, and the system default
> python is used instead of the intended original.
> test code: (simple.py)
> from pyspark import SparkConf, SparkContext
> import sys,os
> conf = SparkConf()
> sc = SparkContext(conf=conf)
> out = [('PYTHON VERSION',str(sys.version))]
> out.extend( zip( os.environ.keys(),os.environ.values() ) )
> rdd = sc.parallelize(out)
> rdd.coalesce(1).saveAsTextFile("hdfs://namenode/tmp/env")
> submit command:
> spark-submit --master yarn --deploy-mode cluster --num-executors 1 simple.py
> I've also tried setting PYSPARK_PYTHON on the command line with no effect.
> It seems like there is no way to specify an alternative python executable in
> yarn-cluster mode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]