Re: [PR] GH-43683: [Python] Use pandas StringDtype when enabled (pandas 3+) [arrow]

via GitHub Wed, 13 Nov 2024 01:50:52 -0800


jorisvandenbossche commented on code in PR #44195:
URL: https://github.com/apache/arrow/pull/44195#discussion_r1839865199



##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -4523,9 +4550,11 @@ def test_metadata_compat_range_index_pre_0_12():
     gen_name_1 = '__index_level_1__'
 
     # Case 1: named RangeIndex
-    e1 = pd.DataFrame({
-        'a': a_values
-    }, index=pd.RangeIndex(0, 8, step=2, name='qux'))
+    e1 = pd.DataFrame(
+        {'a': a_values},
+        index=pd.RangeIndex(0, 8, step=2, name='qux'),
+        columns=pd.Index(['a'], dtype=object)

Review Comment:
   OK, changed this to ensure we actually use `str` dtype columns Index object, 
even if the pandas metadata of the pyarrow table says that the original table 
was using object dtype. 
   
   This ensures that all existing files will use (with pandas>= 3) the default 
str dtype for the columns, but that also has the trade-off that _if_ you 
explicitly want to use object dtype with strings, that this will no longer 
roundtrip in pandas->pyarrow/parquet->pandas)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-43683: [Python] Use pandas StringDtype when enabled (pandas 3+) [arrow]

Reply via email to