[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

Juliet Hougland (JIRA) Mon, 13 Jul 2015 13:03:37 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625245#comment-14625245
 ]


Juliet Hougland commented on SPARK-8646:
----------------------------------------

[~lianhuiwang] in $SPARK_HOME/conf I only have the spark-defaults.conf.template 
file, not a non-template version. I also do not set the spark master to local 
programmatically.

[~vanzin] The command logged to stderr is:
    
Spark Command: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/bin/java -cp 
/home/juliet/bin/spark-1.4.0-bin-hadoop2.6/conf/:/home/juliet/bin/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/juliet/bin/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/juliet/bin/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/juliet/bin/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/etc/hadoop/conf/
 -Xms512m -Xmx512m -XX:
MaxPermSize=128m org.apache.spark.deploy.SparkSubmit --verbose 
outofstock/data_transform.py
hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex7/ yarn-client

(sorry for the way the classpath gets chopped up between lines.) yarn-client is 
getting passed as a argument to my code, but because I am not specifying the 
master via the cli --master flag or via spark-defaults.conf it does not affect 
how the job initially starts up.

> PySpark does not run on YARN
> ----------------------------
>
>                 Key: SPARK-8646
>                 URL: https://issues.apache.org/jira/browse/SPARK-8646
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.4.0
>         Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py 
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in 
> my own code.
>            Reporter: Juliet Hougland
>         Attachments: executor.log, pi-test.log, 
> spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, 
> spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, 
> spark1.4-SPARK_HOME-set.log, spark1.4-verbose.log, verbose-executor.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in 
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.| 
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that 
> worked on previous spark versions (ie comands the use spark-submit) should 
> continue to work without modification between minor versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8646) PySpark does not run on YARN

Reply via email to