[ https://issues.apache.org/jira/browse/SPARK-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381844#comment-14381844 ]
Lianhui Wang edited comment on SPARK-6506 at 3/26/15 1:17 PM: -------------------------------------------------------------- hi [~tgraves] I use 1.3.0 to run. if i donot set SPARK_HOME at every node, i get the following exception in every executor: Error from python worker: /usr/bin/python: No module named pyspark PYTHONPATH was: /data/yarnenv/local/usercache/lianhui/filecache/296/spark-assembly-1.3.0-hadoop2.2.0.jar/python java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) from the exception, i can find that pyspark of the spark.jar in nodeManager cannot be worked. and i donot know why it is. [~andrewor14] can you help me? so i think now we should put spark dirs to PYTHONPATH or SPARK_HOME at every node. was (Author: lianhuiwang): hi [~tgraves] I use 1.3.0 to run. if i donot set SPARK_HOME at every node, i get the following exception in every executor: Error from python worker: /usr/bin/python: No module named pyspark PYTHONPATH was: /data/yarnenv/local/usercache/lianhui/filecache/296/spark-assembly-1.3.0-hadoop2.2.0.jar/python java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) from the exception, i can find that pyspark of the spark.jar in nodeManager cannot be worked. and i donot know why it is. [~andrewor14] can you help me? so i think now we should put spark dirs to SPARK_HOME at every node. > python support yarn cluster mode requires SPARK_HOME to be set > -------------------------------------------------------------- > > Key: SPARK-6506 > URL: https://issues.apache.org/jira/browse/SPARK-6506 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.3.0 > Reporter: Thomas Graves > > We added support for python running in yarn cluster mode in > https://issues.apache.org/jira/browse/SPARK-5173, but it requires that > SPARK_HOME be set in the environment variables for application master and > executor. It doesn't have to be set to anything real but it fails if its not > set. See the command at the end of: https://github.com/apache/spark/pull/3976 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org