[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

Marcelo Vanzin (JIRA) Fri, 26 Jun 2015 10:19:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603228#comment-14603228
 ]


Marcelo Vanzin commented on SPARK-8646:
---------------------------------------

Hi [~j_houg],

Seems there's something weird going on in your setup. I downloaded the 1.4 
hadoop 2.6 archive you're using, and ran this command line, without setting any 
extra env variables:

{code}
HADOOP_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --master yarn-client 
examples/src/main/python/pi.py
{code}

And it works. Notably, I see these two lines that seem to be missing from your 
logs:

{noformat}
15/06/26 10:14:28 INFO yarn.Client: Uploading resource 
file:/tmp/spark-1.4.0-bin-hadoop2.6/python/lib/pyspark.zip -> 
hdfs://vanzin-st1-1.vpc.cloudera.com:8020/user/systest/.sparkStaging/application_1435333340717_0002/pyspark.zip
15/06/26 10:14:28 INFO yarn.Client: Uploading resource 
file:/tmp/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip -> 
hdfs://vanzin-st1-1.vpc.cloudera.com:8020/user/systest/.sparkStaging/application_1435333340717_0002/py4j-0.8.2.1-src.zip
{noformat}

That's the code added in the change you mention; it's actually what allows 
pyspark to run with that large assembly (which python cannot read).

Can you double check the command line you're running (or try the simple example 
above)? Also, make sure your {{$SPARK_HOME/conf}} directory is not pointing at 
some other Spark configuration, or that you don't have any other env variables 
that may be affecting Spark configuration.

> PySpark does not run on YARN
> ----------------------------
>
>                 Key: SPARK-8646
>                 URL: https://issues.apache.org/jira/browse/SPARK-8646
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.4.0
>         Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py 
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in 
> my own code.
>            Reporter: Juliet Hougland
>         Attachments: spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, 
> spark1.4-SPARK_HOME-set.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in 
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.| 
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that 
> worked on previous spark versions (ie comands the use spark-submit) should 
> continue to work without modification between minor versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

Reply via email to