[
https://issues.apache.org/jira/browse/ARROW-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285395#comment-16285395
]
ASF GitHub Bot commented on ARROW-1754:
---------------------------------------
jorisvandenbossche opened a new pull request #1408: ARROW-1754: [Python]
alternative fix for duplicate index/column name that preserves index name if
available
URL: https://github.com/apache/arrow/pull/1408
Related to the discussion about the pandas metadata specification in
https://github.com/pandas-dev/pandas/pull/18201, and an alternative to
https://github.com/apache/arrow/pull/1271.
I don't open this PR because it should necessarily be merged, I just want to
show that it is not that difficult to both fix
[ARROW-1754](https://issues.apache.org/jira/browse/ARROW-1754) and preserve
index names as field names when possible (as this was mentioned in
https://github.com/pandas-dev/pandas/pull/18201 as the reason to make this
change to not preserve index names).
The diff is partly a revert of https://github.com/apache/arrow/pull/1271,
but then adapted to the current codebase.
Main reasons I prefer to preserve index names: 1) usability in pyarrow
itself (if you would want to work with pyarrow Tables created from pandas) and
2) when interchanging parquet files with other people / other non-pandas
systems, then it would be much nicer to not have `__index_level_n__` column
names if possible.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Fix buggy Parquet roundtrip when an index name is the same as a
> column name
> ------------------------------------------------------------------------------------
>
> Key: ARROW-1754
> URL: https://issues.apache.org/jira/browse/ARROW-1754
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Wes McKinney
> Assignee: Phillip Cloud
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See upstream report
> https://stackoverflow.com/questions/47013052/issue-with-pyarrow-when-loading-parquet-file-where-index-has-redundant-column
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)