[
https://issues.apache.org/jira/browse/SPARK-21753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-21753.
----------------------------------
Resolution: Fixed
Assignee: Kyle Kelley
Fix Version/s: 2.3.0
Fixed in https://github.com/apache/spark/pull/18734
> running pi example with pypy on spark fails to serialize
> ---------------------------------------------------------
>
> Key: SPARK-21753
> URL: https://issues.apache.org/jira/browse/SPARK-21753
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.1.1
> Reporter: Thomas Graves
> Assignee: Kyle Kelley
> Fix For: 2.3.0
>
>
> I'm trying to run the pi example
> (https://github.com/apache/spark/blob/master/examples/src/main/python/pi.py)
> on pyspark using pypy 2.5.1 but everything I've tried results in a
> serialization error:
> Traceback (most recent call last):
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 147,
> in dump
> return Pickler.dump(self, obj)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 224, in dump
> self.save(obj)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 562, in save_tuple
> save(element)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 254,
> in save_function
> self.save_function_tuple(obj)
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 291,
> in save_function_tuple
> save((code, closure, base_globals))
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 548, in save_tuple
> save(element)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 600, in save_list
> self._batch_appends(iter(obj))
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 633, in _batch_appends
> save(x)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 254,
> in save_function
> self.save_function_tuple(obj)
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 291,
> in save_function_tuple
> save((code, closure, base_globals))
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 548, in save_tuple
> save(element)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 600, in save_list
> self._batch_appends(iter(obj))
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 636, in _batch_appends
> save(tmp[0])
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 248,
> in save_function
> self.save_function_tuple(obj)
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 296,
> in save_function_tuple
> save(f_globals)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 653, in save_dict
> self._batch_setitems(obj.iteritems())
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 690, in _batch_setitems
> save(v)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 447,
> in save_instancemethod
> obj=obj)
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 581,
> in save_reduce
> save(args)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 548, in save_tuple
> save(element)
> File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py",
> line 286, in save
> f(self, obj) # Call unbound method with explicit self
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 246,
> in save_function
> if islambda(obj) or obj.__code__.co_filename == '<stdin>' or themodule is
> None:
> AttributeError: 'builtin-code' object has no attribute 'co_filename'
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 834, in reduce
> vals = self.mapPartitions(func).collect()
> File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 808, in collect
> port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 2440, in _jrdd
> self._jrdd_deserializer, profiler)
> File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 2373, in
> _wrap_function
> pickled_command, broadcast_vars, env, includes =
> _prepare_for_python_RDD(sc, command)
> File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 2359, in
> _prepare_for_python_RDD
> pickled_command = ser.dumps(command)
> File "/home/tgraves/y-spark-git/python/pyspark/serializers.py", line 460,
> in dumps
> return cloudpickle.dumps(obj, 2)
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 703,
> in dumps
> cp.dump(obj)
> File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 160,
> in dump
> raise pickle.PicklingError(msg)
> It looks like the issue is with serializing random(). If you remove random()
> from the function then everything works fine.
> I'm just running
> PYSPARK_PYTHON=//home/tgraves/pypy-my-own-package-name/bin/pypy ./bin/pyspark
> I've tried multiple versions of pypy from 2.5.1 to 5.8.0. I tried the
> portable version as well as built pypy from source.
> If it works for others perhaps I have a setup issue, any hints on that would
> be appreciated.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]