[
https://issues.apache.org/jira/browse/ARROW-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225799#comment-16225799
]
Phillip Cloud edited comment on ARROW-1754 at 10/30/17 11:29 PM:
-----------------------------------------------------------------
I think we should solve this by always making index column name follow the
pattern for unnamed columns, namely, {{\_\_index\_level\_<N>\_\_}}. Along with
changing {{index_columns}} to be a list of dictionaries mapping the raw arrow
column name to either {{None}} or the actual column name.
I'll update the pandas metadata spec accordingly.
was (Author: cpcloud):
I think we should solve this by always making index column name follow the
pattern for unnamed columns, namely, {{__index_level_<N>__}}. Along with
changing {{index_columns}} to be a list of dictionaries mapping the raw arrow
column name to either {{None}} or the actual column name.
I'll update the pandas metadata spec accordingly.
> [Python] Fix buggy Parquet roundtrip when an index name is the same as a
> column name
> ------------------------------------------------------------------------------------
>
> Key: ARROW-1754
> URL: https://issues.apache.org/jira/browse/ARROW-1754
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Wes McKinney
> Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> See upstream report
> https://stackoverflow.com/questions/47013052/issue-with-pyarrow-when-loading-parquet-file-where-index-has-redundant-column
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)