[jira] [Comment Edited] (SPARK-54068) Fix `pyspark.pandas.tests.connect.io.test_parity_feather.FeatherParityTests.test_to_feather` in PyArrow 22.0.0

Ashrith Bandla (Jira) Sun, 07 Dec 2025 10:48:39 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-54068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043381#comment-18043381
 ]


Ashrith Bandla edited comment on SPARK-54068 at 12/7/25 6:47 PM:
-----------------------------------------------------------------

Hey [~dongjoon], I came across your logic that skipped some tests in 
test_feather.py if the PyArrow version was too high since the test would fail, 
but it now works properly with my fixes and passes the tests. I fixed the issue 
by converting some of the metrics to dictionaries before storing them in 
Dataframe attrs. This is backwards compatible with previous versions of PyArrow 
as well. I added my changes in a short PR here, 
[https://github.com/apache/spark/pull/53377].


was (Author: JIRAUSER311521):
Hey [~dongjoon], I came across your logic that skipped some tests in 
test_feather.py if the PyArrow version was too high since the test would fail, 
but it now works properly with my fixe and passes the tests. I fixed the issue 
by converting some of the metrics to dictionaries before storing them in 
Dataframe attrs. This is backwards compatible with previous versions of PyArrow 
as well. I added my changes in a short PR here, 
[https://github.com/apache/spark/pull/53377].

> Fix 
> `pyspark.pandas.tests.connect.io.test_parity_feather.FeatherParityTests.test_to_feather`
>  in PyArrow 22.0.0
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-54068
>                 URL: https://issues.apache.org/jira/browse/SPARK-54068
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect, PySpark
>    Affects Versions: 4.1.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>              Labels: pull-request-available
>
> {code}
> ======================================================================
> ERROR [1.960s]: test_to_feather 
> (pyspark.pandas.tests.connect.io.test_parity_feather.FeatherParityTests.test_to_feather)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/pandas/tests/io/test_feather.py", 
> line 43, in test_to_feather
>     self.psdf.to_feather(path2)
>     ~~~~~~~~~~~~~~~~~~~~^^^^^^^
>   File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2702, in 
> to_feather
>     return validate_arguments_and_invoke_function(
>         self._to_internal_pandas(), self.to_feather, pd.DataFrame.to_feather, 
> args
>     )
>   File "/__w/spark/spark/python/pyspark/pandas/utils.py", line 592, in 
> validate_arguments_and_invoke_function
>     return pandas_func(**args)
>   File "/usr/local/lib/python3.14/dist-packages/pandas/core/frame.py", line 
> 2949, in to_feather
>     to_feather(self, path, **kwargs)
>     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/local/lib/python3.14/dist-packages/pandas/io/feather_format.py", 
> line 65, in to_feather
>     feather.write_feather(df, handles.handle, **kwargs)
>     ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/local/lib/python3.14/dist-packages/pyarrow/feather.py", line 
> 156, in write_feather
>     table = Table.from_pandas(df, preserve_index=preserve_index)
>   File "pyarrow/table.pxi", line 4795, in pyarrow.lib.Table.from_pandas
>   File "/usr/local/lib/python3.14/dist-packages/pyarrow/pandas_compat.py", 
> line 663, in dataframe_to_arrays
>     pandas_metadata = construct_metadata(
>         columns_to_convert, df, column_names, index_columns, 
> index_descriptors,
>         preserve_index, types, column_field_names=column_field_names
>     )
>   File "/usr/local/lib/python3.14/dist-packages/pyarrow/pandas_compat.py", 
> line 281, in construct_metadata
>     b'pandas': json.dumps({
>                ~~~~~~~~~~^^
>         'index_columns': index_descriptors,
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     ...<7 lines>...
>         'pandas_version': _pandas_api.version
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     }).encode('utf8')
>     ^^
>   File "/usr/lib/python3.14/json/__init__.py", line 231, in dumps
>     return _default_encoder.encode(obj)
>            ~~~~~~~~~~~~~~~~~~~~~~~^^^^^
>   File "/usr/lib/python3.14/json/encoder.py", line 200, in encode
>     chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python3.14/json/encoder.py", line 261, in iterencode
>     return _iterencode(o, 0)
>   File "/usr/lib/python3.14/json/encoder.py", line 180, in default
>     raise TypeError(f'Object of type {o.__class__.__name__} '
>                     f'is not JSON serializable')
> TypeError: Object of type PlanMetrics is not JSON serializable
> when serializing list item 0
> when serializing dict item 'metrics'
> when serializing dict item 'attributes'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-54068) Fix `pyspark.pandas.tests.connect.io.test_parity_feather.FeatherParityTests.test_to_feather` in PyArrow 22.0.0

Reply via email to