[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

Juliet Hougland (JIRA) Mon, 06 Jul 2015 03:50:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614856#comment-14614856
 ]


Juliet Hougland commented on SPARK-8646:
----------------------------------------

[~sowen] The pandas error came when I tried to run the pi job-- which doesn't 
import pandas at all. The only imports in 
$SPARK_1.4_HOME/examples/src/main/python/pi.py are as follows:

    import sys
    from random import random
    from operator import add
    from pyspark import SparkContext

    
 PySpark itself doesn't require pandas (if it does, that should be documented) 
so having the pi job (doesn't require pandas) fail with a pandas not found 
error is wrong, because at no point should the pi job or pyspark itself require 
pandas. The pandas error is very, very weird but not obviously directly related 
to this ticket. The problem I reported here has to do with pyspark itself not 
being shipped or perhaps available to the worker nodes when I run a pyspark app 
from spark 1.4 using YARN.

> PySpark does not run on YARN
> ----------------------------
>
>                 Key: SPARK-8646
>                 URL: https://issues.apache.org/jira/browse/SPARK-8646
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.4.0
>         Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py 
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in 
> my own code.
>            Reporter: Juliet Hougland
>         Attachments: pi-test.log, spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, 
> spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, 
> spark1.4-SPARK_HOME-set.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in 
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.| 
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that 
> worked on previous spark versions (ie comands the use spark-submit) should 
> continue to work without modification between minor versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

Reply via email to