[
https://issues.apache.org/jira/browse/SPARK-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Or updated SPARK-1688:
-----------------------------
Description:
Currently, if pyspark cannot be loaded, this happens:
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:183)
at
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:55)
at
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:42)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:97)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
This can be caused by a few things:
(1) PYTHONPATH is not set
(2) PYTHONPATH does not contain the python directory (or jar, in the case of
YARN)
(3) The jar does not contain pyspark files (YARN)
(4) The jar does not contain py4j files (YARN)
We should have explicit error messages for each one of them. For (2 - 4), we
should print out the PYTHONPATH so the user doesn't have to SSH into the
executor machines themselves to figure this out.
> PySpark throws unhelpful exception when pyspark cannot be loaded
> ----------------------------------------------------------------
>
> Key: SPARK-1688
> URL: https://issues.apache.org/jira/browse/SPARK-1688
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, Spark Core
> Affects Versions: 0.9.1
> Reporter: Andrew Or
> Assignee: Andrew Or
> Fix For: 1.0.0
>
>
> Currently, if pyspark cannot be loaded, this happens:
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:183)
> at
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:55)
> at
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:42)
> at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:97)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:57)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> This can be caused by a few things:
> (1) PYTHONPATH is not set
> (2) PYTHONPATH does not contain the python directory (or jar, in the case
> of YARN)
> (3) The jar does not contain pyspark files (YARN)
> (4) The jar does not contain py4j files (YARN)
> We should have explicit error messages for each one of them. For (2 - 4), we
> should print out the PYTHONPATH so the user doesn't have to SSH into the
> executor machines themselves to figure this out.
--
This message was sent by Atlassian JIRA
(v6.2#6252)