[ 
https://issues.apache.org/jira/browse/SPARK-54065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-54065:
-----------------------------------
    Labels: pull-request-available  (was: )

> Fix `test_in_memory_data_source` in Python 3.14
> -----------------------------------------------
>
>                 Key: SPARK-54065
>                 URL: https://issues.apache.org/jira/browse/SPARK-54065
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 4.1.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>              Labels: pull-request-available
>
> {code}
> ======================================================================
> ERROR [0.007s]: test_in_memory_data_source 
> (pyspark.sql.tests.test_python_datasource.PythonDataSourceTests.test_in_memory_data_source)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/serializers.py", line 460, in dumps
>     return cloudpickle.dumps(obj, pickle_protocol)
>            ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 
> 1537, in dumps
>     cp.dump(obj)
>     ~~~~~~~^^^^^
>   File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 
> 1303, in dump
>     return super().dump(obj)
>            ~~~~~~~~~~~~^^^^^
> TypeError: cannot pickle '_abc._abc_data' object
> when serializing dict item '_abc_impl'
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item '__annotate_func__'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item 'reader'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/sql/tests/test_python_datasource.py", 
> line 283, in test_in_memory_data_source
>     self.spark.dataSource.register(InMemoryDataSource)
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/sql/datasource.py", line 1197, in 
> register
>     wrapped = _wrap_function(sc, dataSource)
>   File "/__w/spark/spark/python/pyspark/sql/udf.py", line 59, in 
> _wrap_function
>     pickled_command, broadcast_vars, env, includes = 
> _prepare_for_python_RDD(sc, command)
>                                                      
> ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/core/rdd.py", line 5121, in 
> _prepare_for_python_RDD
>     pickled_command = ser.dumps(command)
>   File "/__w/spark/spark/python/pyspark/serializers.py", line 470, in dumps
>     raise pickle.PicklingError(msg)
> _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle 
> '_abc._abc_data' object
> {code}
> {code}
> ======================================================================
> ERROR [0.014s]: test_in_memory_data_source 
> (pyspark.sql.tests.connect.test_parity_python_datasource.PythonDataSourceParityTests.test_in_memory_data_source)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/serializers.py", line 460, in dumps
>     return cloudpickle.dumps(obj, pickle_protocol)
>            ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 
> 1537, in dumps
>     cp.dump(obj)
>     ~~~~~~~^^^^^
>   File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 
> 1303, in dump
>     return super().dump(obj)
>            ~~~~~~~~~~~~^^^^^
> TypeError: cannot pickle '_abc._abc_data' object
> when serializing dict item '_abc_impl'
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item '__annotate_func__'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item 'reader'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/sql/tests/test_python_datasource.py", 
> line 283, in test_in_memory_data_source
>     self.spark.dataSource.register(InMemoryDataSource)
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/sql/connect/datasource.py", line 45, 
> in register
>     self.sparkSession._client.register_data_source(dataSource)
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 
> 863, in register_data_source
>     ).to_data_source_proto(self)
>       ~~~~~~~~~~~~~~~~~~~~^^^^^^
>   File "/__w/spark/spark/python/pyspark/sql/connect/plan.py", line 2833, in 
> to_data_source_proto
>     plan.python_data_source.CopyFrom(self._data_source.to_plan(session))
>                                      ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/sql/connect/plan.py", line 2807, in 
> to_plan
>     ds.command = CloudPickleSerializer().dumps(self._data_source)
>                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
>   File "/__w/spark/spark/python/pyspark/serializers.py", line 470, in dumps
>     raise pickle.PicklingError(msg)
> _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle 
> '_abc._abc_data' object
> ----------------------------------------------------------------------
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to