[
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649062#comment-14649062
]
Min Wu commented on SPARK-8646:
-------------------------------
Hi, I got same issue when I running the pyspark program with yarn-client mode
and spark 1.4.1 from Biginsight 4.1(Ambari). Because the assembly jar no longer
contains the python scripts of pyspark and py4j, so I set the spark home via
SparkContext.setSparkHome() to spark-client location(because this is one Ambari
hadoop, so the spark-client contains the python folder, and it includes the
py4j and pyspark scripts). The API document shows this will be applied to slave
nodes, I assume this can be applied for "spark on yarn" also, but it does not
work. The worker nodes always get the PYTHONPATH from cached assembly jar.
After checked the SparkContext code, seems the sparkHome will be set into
SparkConf as "spark.home", so I think maybe it should be distributed to all
executor and pyspark can use this parameter to locate the PYTHONPATH also.
> PySpark does not run on YARN if master not provided in command line
> -------------------------------------------------------------------
>
> Key: SPARK-8646
> URL: https://issues.apache.org/jira/browse/SPARK-8646
> Project: Spark
> Issue Type: Bug
> Components: PySpark, YARN
> Affects Versions: 1.4.0
> Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in
> my own code.
> Reporter: Juliet Hougland
> Assignee: Lianhui Wang
> Fix For: 1.5.0
>
> Attachments: executor.log, pi-test.log,
> spark1.4-SPARK_HOME-set-PYTHONPATH-set.log,
> spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log,
> spark1.4-SPARK_HOME-set.log, spark1.4-verbose.log, verbose-executor.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.|
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that
> worked on previous spark versions (ie comands the use spark-submit) should
> continue to work without modification between minor versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]