jorisvandenbossche commented on code in PR #44195:
URL: https://github.com/apache/arrow/pull/44195#discussion_r1839865199
##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -4523,9 +4550,11 @@ def test_metadata_compat_range_index_pre_0_12():
gen_name_1 = '__index_level_1__'
# Case 1: named RangeIndex
- e1 = pd.DataFrame({
- 'a': a_values
- }, index=pd.RangeIndex(0, 8, step=2, name='qux'))
+ e1 = pd.DataFrame(
+ {'a': a_values},
+ index=pd.RangeIndex(0, 8, step=2, name='qux'),
+ columns=pd.Index(['a'], dtype=object)
Review Comment:
OK, changed this to ensure we actually use `str` dtype columns Index object,
even if the pandas metadata of the pyarrow table says that the original table
was using object dtype.
This ensures that all existing files will use (with pandas>= 3) the default
str dtype for the columns, but that also has the trade-off that _if_ you
explicitly want to use object dtype with strings, that this will no longer
roundtrip in pandas->pyarrow/parquet->pandas)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]