[jira] [Commented] (ARROW-1754) [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name

ASF GitHub Bot (JIRA) Sun, 10 Dec 2017 13:59:51 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285395#comment-16285395
 ]


ASF GitHub Bot commented on ARROW-1754:
---------------------------------------

jorisvandenbossche opened a new pull request #1408: ARROW-1754: [Python] 
alternative fix for duplicate index/column name that preserves index name if 
available
URL: https://github.com/apache/arrow/pull/1408
 
 
   Related to the discussion about the pandas metadata specification in 
https://github.com/pandas-dev/pandas/pull/18201, and an alternative to 
https://github.com/apache/arrow/pull/1271.
   
   I don't open this PR because it should necessarily be merged, I just want to 
show that it is not that difficult to both fix 
[ARROW-1754](https://issues.apache.org/jira/browse/ARROW-1754) and preserve 
index names as field names when possible (as this was mentioned in 
https://github.com/pandas-dev/pandas/pull/18201 as the reason to make this 
change to not preserve index names). 
   The diff is partly a revert of https://github.com/apache/arrow/pull/1271, 
but then adapted to the current codebase.
   
   Main reasons I prefer to preserve index names: 1) usability in pyarrow 
itself (if you would want to work with pyarrow Tables created from pandas) and 
2) when interchanging parquet files with other people / other non-pandas 
systems, then it would be much nicer to not have `__index_level_n__` column 
names if possible.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Fix buggy Parquet roundtrip when an index name is the same as a 
> column name
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-1754
>                 URL: https://issues.apache.org/jira/browse/ARROW-1754
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Wes McKinney
>            Assignee: Phillip Cloud
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> See upstream report 
> https://stackoverflow.com/questions/47013052/issue-with-pyarrow-when-loading-parquet-file-where-index-has-redundant-column



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1754) [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name

Reply via email to