[jira] [Commented] (SPARK-8646) PySpark does not run on YARN if master not provided in command line

Min Wu (JIRA) Fri, 31 Jul 2015 03:47:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649062#comment-14649062
 ]


Min Wu commented on SPARK-8646:
-------------------------------

Hi, I got same issue when I running the pyspark program with yarn-client mode 
and spark 1.4.1 from Biginsight 4.1(Ambari). Because the assembly jar no longer 
contains the python scripts of pyspark and py4j, so I set the spark home via 
SparkContext.setSparkHome() to spark-client location(because this is one Ambari 
hadoop, so the spark-client contains the python folder, and it includes the 
py4j and pyspark scripts). The API document shows this will be applied to slave 
nodes, I assume this can be applied for "spark on yarn" also, but it does not 
work.  The worker nodes always get the PYTHONPATH from cached assembly jar. 
After checked the SparkContext code, seems the sparkHome will be set into 
SparkConf as "spark.home", so I think maybe it should be distributed to all 
executor and pyspark can use this parameter to locate the PYTHONPATH also.

> PySpark does not run on YARN if master not provided in command line
> -------------------------------------------------------------------
>
>                 Key: SPARK-8646
>                 URL: https://issues.apache.org/jira/browse/SPARK-8646
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.4.0
>         Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py 
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in 
> my own code.
>            Reporter: Juliet Hougland
>            Assignee: Lianhui Wang
>             Fix For: 1.5.0
>
>         Attachments: executor.log, pi-test.log, 
> spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, 
> spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, 
> spark1.4-SPARK_HOME-set.log, spark1.4-verbose.log, verbose-executor.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in 
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.| 
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that 
> worked on previous spark versions (ie comands the use spark-submit) should 
> continue to work without modification between minor versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8646) PySpark does not run on YARN if master not provided in command line

Reply via email to