Juliet Hougland created SPARK-13303:
---------------------------------------

             Summary: Spark fails with pandas import error when pandas is not 
explicitly imported by user
                 Key: SPARK-13303
                 URL: https://issues.apache.org/jira/browse/SPARK-13303
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.6.0
         Environment: The python installation used by the driver (edge node) 
has pandas installed on it, while on the data nodes pandas do not have pandas 
installed in the python runtimes used. Pandas is never explicitly imported by 
pi.py.
            Reporter: Juliet Hougland


Running `spark-submit pi.py` results in:

  File 
"/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/worker.py",
 line 98, in main
    command = pickleSer._read_with_length(infile)
  File 
"/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",
 line 164, in _read_with_length
    return self.loads(obj)
  File 
"/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",
 line 422, in loads
    return pickle.loads(obj)
ImportError: No module named pandas.algos

        at 
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
        at 
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


This is unexpected and hard for users to unravel why they may see this error, 
as they themselves have not explicitly done anything with pandas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to