[jira] [Comment Edited] (SPARK-6506) python support yarn cluster mode requires SPARK_HOME to be set

Lianhui Wang (JIRA) Thu, 26 Mar 2015 06:18:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381844#comment-14381844
 ]


Lianhui Wang edited comment on SPARK-6506 at 3/26/15 1:17 PM:
--------------------------------------------------------------

hi [~tgraves] I use 1.3.0 to run. if i donot set SPARK_HOME at every node, i 
get the following exception in every executor:
Error from python worker:
  /usr/bin/python: No module named pyspark
PYTHONPATH was:
  
/data/yarnenv/local/usercache/lianhui/filecache/296/spark-assembly-1.3.0-hadoop2.2.0.jar/python
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164)
        at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
        at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

from the exception, i can find that pyspark of the spark.jar in nodeManager 
cannot be worked. and i donot know why it is. [~andrewor14] can you help me?
so i think now we should put spark dirs to PYTHONPATH  or SPARK_HOME at every 
node.


was (Author: lianhuiwang):
hi [~tgraves] I use 1.3.0 to run. if i donot set SPARK_HOME at every node, i 
get the following exception in every executor:
Error from python worker:
  /usr/bin/python: No module named pyspark
PYTHONPATH was:
  
/data/yarnenv/local/usercache/lianhui/filecache/296/spark-assembly-1.3.0-hadoop2.2.0.jar/python
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164)
        at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
        at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

from the exception, i can find that pyspark of the spark.jar in nodeManager 
cannot be worked. and i donot know why it is. [~andrewor14] can you help me?
so i think now we should put spark dirs to SPARK_HOME at every node.

> python support yarn cluster mode requires SPARK_HOME to be set
> --------------------------------------------------------------
>
>                 Key: SPARK-6506
>                 URL: https://issues.apache.org/jira/browse/SPARK-6506
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.3.0
>            Reporter: Thomas Graves
>
> We added support for python running in yarn cluster mode in 
> https://issues.apache.org/jira/browse/SPARK-5173, but it requires that 
> SPARK_HOME be set in the environment variables for application master and 
> executor.  It doesn't have to be set to anything real but it fails if its not 
> set.  See the command at the end of: https://github.com/apache/spark/pull/3976



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6506) python support yarn cluster mode requires SPARK_HOME to be set

Reply via email to