[
https://issues.apache.org/jira/browse/SPARK-54065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-54065:
-----------------------------------
Labels: pull-request-available (was: )
> Fix `test_in_memory_data_source` in Python 3.14
> -----------------------------------------------
>
> Key: SPARK-54065
> URL: https://issues.apache.org/jira/browse/SPARK-54065
> Project: Spark
> Issue Type: Sub-task
> Components: PySpark
> Affects Versions: 4.1.0
> Reporter: Dongjoon Hyun
> Priority: Blocker
> Labels: pull-request-available
>
> {code}
> ======================================================================
> ERROR [0.007s]: test_in_memory_data_source
> (pyspark.sql.tests.test_python_datasource.PythonDataSourceTests.test_in_memory_data_source)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/__w/spark/spark/python/pyspark/serializers.py", line 460, in dumps
> return cloudpickle.dumps(obj, pickle_protocol)
> ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line
> 1537, in dumps
> cp.dump(obj)
> ~~~~~~~^^^^^
> File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line
> 1303, in dump
> return super().dump(obj)
> ~~~~~~~~~~~~^^^^^
> TypeError: cannot pickle '_abc._abc_data' object
> when serializing dict item '_abc_impl'
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item '__annotate_func__'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item 'reader'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/__w/spark/spark/python/pyspark/sql/tests/test_python_datasource.py",
> line 283, in test_in_memory_data_source
> self.spark.dataSource.register(InMemoryDataSource)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/sql/datasource.py", line 1197, in
> register
> wrapped = _wrap_function(sc, dataSource)
> File "/__w/spark/spark/python/pyspark/sql/udf.py", line 59, in
> _wrap_function
> pickled_command, broadcast_vars, env, includes =
> _prepare_for_python_RDD(sc, command)
>
> ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/core/rdd.py", line 5121, in
> _prepare_for_python_RDD
> pickled_command = ser.dumps(command)
> File "/__w/spark/spark/python/pyspark/serializers.py", line 470, in dumps
> raise pickle.PicklingError(msg)
> _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle
> '_abc._abc_data' object
> {code}
> {code}
> ======================================================================
> ERROR [0.014s]: test_in_memory_data_source
> (pyspark.sql.tests.connect.test_parity_python_datasource.PythonDataSourceParityTests.test_in_memory_data_source)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/__w/spark/spark/python/pyspark/serializers.py", line 460, in dumps
> return cloudpickle.dumps(obj, pickle_protocol)
> ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line
> 1537, in dumps
> cp.dump(obj)
> ~~~~~~~^^^^^
> File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line
> 1303, in dump
> return super().dump(obj)
> ~~~~~~~~~~~~^^^^^
> TypeError: cannot pickle '_abc._abc_data' object
> when serializing dict item '_abc_impl'
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item '__annotate_func__'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> when serializing tuple item 0
> when serializing cell reconstructor arguments
> when serializing cell object
> when serializing tuple item 0
> when serializing dict item '__closure__'
> when serializing tuple item 1
> when serializing function state
> when serializing function object
> when serializing dict item 'reader'
> when serializing tuple item 0
> when serializing abc.ABCMeta state
> when serializing abc.ABCMeta object
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/__w/spark/spark/python/pyspark/sql/tests/test_python_datasource.py",
> line 283, in test_in_memory_data_source
> self.spark.dataSource.register(InMemoryDataSource)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/sql/connect/datasource.py", line 45,
> in register
> self.sparkSession._client.register_data_source(dataSource)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line
> 863, in register_data_source
> ).to_data_source_proto(self)
> ~~~~~~~~~~~~~~~~~~~~^^^^^^
> File "/__w/spark/spark/python/pyspark/sql/connect/plan.py", line 2833, in
> to_data_source_proto
> plan.python_data_source.CopyFrom(self._data_source.to_plan(session))
> ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/sql/connect/plan.py", line 2807, in
> to_plan
> ds.command = CloudPickleSerializer().dumps(self._data_source)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
> File "/__w/spark/spark/python/pyspark/serializers.py", line 470, in dumps
> raise pickle.PicklingError(msg)
> _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle
> '_abc._abc_data' object
> ----------------------------------------------------------------------
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]