[
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603228#comment-14603228
]
Marcelo Vanzin commented on SPARK-8646:
---------------------------------------
Hi [~j_houg],
Seems there's something weird going on in your setup. I downloaded the 1.4
hadoop 2.6 archive you're using, and ran this command line, without setting any
extra env variables:
{code}
HADOOP_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --master yarn-client
examples/src/main/python/pi.py
{code}
And it works. Notably, I see these two lines that seem to be missing from your
logs:
{noformat}
15/06/26 10:14:28 INFO yarn.Client: Uploading resource
file:/tmp/spark-1.4.0-bin-hadoop2.6/python/lib/pyspark.zip ->
hdfs://vanzin-st1-1.vpc.cloudera.com:8020/user/systest/.sparkStaging/application_1435333340717_0002/pyspark.zip
15/06/26 10:14:28 INFO yarn.Client: Uploading resource
file:/tmp/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip ->
hdfs://vanzin-st1-1.vpc.cloudera.com:8020/user/systest/.sparkStaging/application_1435333340717_0002/py4j-0.8.2.1-src.zip
{noformat}
That's the code added in the change you mention; it's actually what allows
pyspark to run with that large assembly (which python cannot read).
Can you double check the command line you're running (or try the simple example
above)? Also, make sure your {{$SPARK_HOME/conf}} directory is not pointing at
some other Spark configuration, or that you don't have any other env variables
that may be affecting Spark configuration.
> PySpark does not run on YARN
> ----------------------------
>
> Key: SPARK-8646
> URL: https://issues.apache.org/jira/browse/SPARK-8646
> Project: Spark
> Issue Type: Bug
> Components: PySpark, YARN
> Affects Versions: 1.4.0
> Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in
> my own code.
> Reporter: Juliet Hougland
> Attachments: spark1.4-SPARK_HOME-set-PYTHONPATH-set.log,
> spark1.4-SPARK_HOME-set.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.|
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that
> worked on previous spark versions (ie comands the use spark-submit) should
> continue to work without modification between minor versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]