[ 
https://issues.apache.org/jira/browse/SPARK-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Kimbrel closed SPARK-9229.
-------------------------------
    Resolution: Invalid

I just missed the correct configuration options.  I need to set 
SPARK_YARN_USER_ENV.

I couldn't find this anywhere in the documentation.

> pyspark yarn-cluster  PYSPARK_PYTHON not set
> --------------------------------------------
>
>                 Key: SPARK-9229
>                 URL: https://issues.apache.org/jira/browse/SPARK-9229
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: centos   Cloudera 5.4.1 based off Apache Hadoop 2.6.0, 
> using spark 1.5.0 built for hadoop 2.6.0 from github master branch on 
> 7.20.2015
>            Reporter: Eric Kimbrel
>
> PYSPARK_PYTHON is set in spark-env.sh to use an alternative python 
> installation.
> Use spark-submit to run a pyspark job in yarn with cluster deploy mode.
> PYSPARK_PTYHON is not set in the cluster environment, and the system default 
> python is used instead of the intended original.
> test code: (simple.py)
> from pyspark import SparkConf, SparkContext
> import sys,os
> conf = SparkConf()
> sc = SparkContext(conf=conf)
> out = [('PYTHON VERSION',str(sys.version))]
> out.extend( zip( os.environ.keys(),os.environ.values() ) )
> rdd = sc.parallelize(out)
> rdd.coalesce(1).saveAsTextFile("hdfs://namenode/tmp/env")
> submit command:
> spark-submit --master yarn --deploy-mode cluster --num-executors 1 simple.py 
> I've also tried setting PYSPARK_PYTHON on the command line with no effect.
> It seems like there is no way to specify an alternative python executable in 
> yarn-cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to