[jira] [Updated] (SPARK-1688) PySpark throws unhelpful exception when pyspark cannot be loaded

Andrew Or (JIRA) Wed, 30 Apr 2014 16:43:16 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Or updated SPARK-1688:
-----------------------------

    Description: 
Currently, if pyspark cannot be loaded, this happens:

java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:183)
        at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:55)
        at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:42)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:97)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:57)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

This can be caused by a few things:
  (1) PYTHONPATH is not set
  (2) PYTHONPATH does not contain the python directory (or jar, in the case of 
YARN)
  (3) The jar does not contain pyspark files (YARN)
  (4) The jar does not contain py4j files (YARN)

We should have explicit error messages for each one of them. For (2 - 4), we 
should print out the PYTHONPATH so the user doesn't have to SSH into the 
executor machines themselves to figure this out.

> PySpark throws unhelpful exception when pyspark cannot be loaded
> ----------------------------------------------------------------
>
>                 Key: SPARK-1688
>                 URL: https://issues.apache.org/jira/browse/SPARK-1688
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 0.9.1
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>             Fix For: 1.0.0
>
>
> Currently, if pyspark cannot be loaded, this happens:
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:183)
>         at 
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:55)
>         at 
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:42)
>         at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:97)
>         at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:57)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> This can be caused by a few things:
>   (1) PYTHONPATH is not set
>   (2) PYTHONPATH does not contain the python directory (or jar, in the case 
> of YARN)
>   (3) The jar does not contain pyspark files (YARN)
>   (4) The jar does not contain py4j files (YARN)
> We should have explicit error messages for each one of them. For (2 - 4), we 
> should print out the PYTHONPATH so the user doesn't have to SSH into the 
> executor machines themselves to figure this out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-1688) PySpark throws unhelpful exception when pyspark cannot be loaded

Reply via email to