[jira] [Commented] (SPARK-13303) Spark fails with pandas import error when pandas is not explicitly imported by user

holdenk (JIRA) Fri, 07 Oct 2016 22:03:54 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557230#comment-15557230
 ]


holdenk commented on SPARK-13303:
---------------------------------

What about if we added a requirements file? We have one for our dev tools - 
having one for PySpark its self should be pretty reasonable.

> Spark fails with pandas import error when pandas is not explicitly imported 
> by user
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-13303
>                 URL: https://issues.apache.org/jira/browse/SPARK-13303
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.6.0
>         Environment: The python installation used by the driver (edge node) 
> has pandas installed on it, while on the data nodes pandas do not have pandas 
> installed in the python runtimes used. Pandas is never explicitly imported by 
> pi.py.
>            Reporter: Juliet Hougland
>
> Running `spark-submit pi.py` results in:
>   File 
> "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/worker.py",
>  line 98, in main
>     command = pickleSer._read_with_length(infile)
>   File 
> "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 164, in _read_with_length
>     return self.loads(obj)
>   File 
> "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 422, in loads
>     return pickle.loads(obj)
> ImportError: No module named pandas.algos
>       at 
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
>       at 
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
>       at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>       at org.apache.spark.scheduler.Task.run(Task.scala:88)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> This is unexpected and hard for users to unravel why they may see this error, 
> as they themselves have not explicitly done anything with pandas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13303) Spark fails with pandas import error when pandas is not explicitly imported by user

Reply via email to