Juliet Hougland created SPARK-13303:
---------------------------------------
Summary: Spark fails with pandas import error when pandas is not
explicitly imported by user
Key: SPARK-13303
URL: https://issues.apache.org/jira/browse/SPARK-13303
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.6.0
Environment: The python installation used by the driver (edge node)
has pandas installed on it, while on the data nodes pandas do not have pandas
installed in the python runtimes used. Pandas is never explicitly imported by
pi.py.
Reporter: Juliet Hougland
Running `spark-submit pi.py` results in:
File
"/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/worker.py",
line 98, in main
command = pickleSer._read_with_length(infile)
File
"/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",
line 164, in _read_with_length
return self.loads(obj)
File
"/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",
line 422, in loads
return pickle.loads(obj)
ImportError: No module named pandas.algos
at
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This is unexpected and hard for users to unravel why they may see this error,
as they themselves have not explicitly done anything with pandas.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]