Copilot commented on code in PR #46884:
URL: https://github.com/apache/arrow/pull/46884#discussion_r3298718286
##########
python/pyarrow/pandas_compat.py:
##########
@@ -378,7 +378,10 @@ def _index_level_name(index, i, column_names):
if index.name is not None and index.name not in column_names:
return _column_name_to_strings(index.name)
else:
- return f'__index_level_{i:d}__'
+ j = i
+ while f'__index_level_{j:d}__' in column_names:
+ j += 1
+ return f'__index_level_{j:d}__'
Review Comment:
Bumping the generated index column name changes the meaning of the numeric
suffix (it no longer matches the index level position). Later in this module,
`_get_index_level()` assumes that any `__index_level_{n}__` name maps to
`df.index.get_level_values(n)`. With this change, a dataframe that already has
`__index_level_0__` will generate `__index_level_1__` for the first unnamed
index level, but `_get_index_level(df, '__index_level_1__')` would incorrectly
select level 1 (or raise for a single-level index) when a user passes an
explicit schema. Consider updating `_get_index_level()` to resolve generated
index names using the same collision-avoidance logic as `_index_level_name`
(based on `df.columns` and already-assigned index names), so schema-based
conversions keep working.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]