[
https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482308#comment-17482308
]
Pedro Larroy edited comment on SPARK-36476 at 1/26/22, 9:59 PM:
----------------------------------------------------------------
This seems to happen as an interaction with the package "dill" and only in
Python 3.7
This was explained here and I verified the reproduction in my codebas:
[https://stackoverflow.com/questions/69360462/conflict-between-dill-and-pickle-while-using-pyspark]
https://github.com/cloudpipe/cloudpickle/issues/393
was (Author: larroy):
This seems to happen as an interaction with the package "dill" and only in
Python 3.7
This was explained here and I verified the reproduction in my codebas:
https://stackoverflow.com/questions/69360462/conflict-between-dill-and-pickle-while-using-pyspark
> cloudpickle: ValueError: Cell is empty
> --------------------------------------
>
> Key: SPARK-36476
> URL: https://issues.apache.org/jira/browse/SPARK-36476
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.1.2
> Reporter: Oliver Mannion
> Priority: Major
>
> {code:java}
> File
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/serializers.py",
> line 437, in dumps
> return cloudpickle.dumps(obj, pickle_protocol)
> File
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
> line 101, in dumps
> cp.dump(obj)
> File
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
> line 540, in dump
> return Pickler.dump(self, obj)
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 437, in dump
> self.save(obj)
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 789, in save_tuple
> save(element)
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 504, in save
> f(self, obj) # Call unbound method with explicit self
> File
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
> line 722, in save_function
> *self._dynamic_function_reduce(obj), obj=obj
> File
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
> line 659, in _save_reduce_pickle5
> dictitems=dictitems, obj=obj
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 638, in save_reduce
> save(args)
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 789, in save_tuple
> save(element)
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 504, in save
> f(self, obj) # Call unbound method with explicit self
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 774, in save_tuple
> save(element)
> File "/Users/tekumara/.pyenv/versions/3.7.9/lib/python3.7/pickle.py", line
> 504, in save
> f(self, obj) # Call unbound method with explicit self
> File
> "/Users/tekumara/code/awesome-spark-app/.venv/lib/python3.7/site-packages/dill/_dill.py",
> line 1226, in save_cell
> f = obj.cell_contents
> ValueError: Cell is empty
> {code}
> Doesn't occur in Spark 3.0.0, so possibly introduced when cloudpickle was
> upgraded to 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-32094).
> Also doesn't occur in Spark 3.1.2 with python 3.8.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]