[jira] [Commented] (ARROW-1754) [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name

ASF GitHub Bot (JIRA) Wed, 24 Jan 2018 14:01:18 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338295#comment-16338295
 ]


ASF GitHub Bot commented on ARROW-1754:
---------------------------------------

jorisvandenbossche commented on a change in pull request #1408: ARROW-1754: 
[Python] alternative fix for duplicate index/column name that preserves index 
name if available
URL: https://github.com/apache/arrow/pull/1408#discussion_r163692020
 
 

 ##########
 File path: python/pyarrow/pandas_compat.py
 ##########
 @@ -294,9 +288,29 @@ def _column_name_to_strings(name):
     return str(name)
 
 
+def _index_level_name(index, i, column_names):
+    """Return the name of an index level or a default name if `index.name` is
+    None or is already a column name.
+
+    Parameters
+    ----------
+    index : pandas.Index
+    i : int
+
+    Returns
+    -------
+    name : str
+    """
+    if index.name is not None and index.name not in column_names:
 
 Review comment:
   I did some timings, and conversion to a set typically takes twice the time 
of a single search in the list. So you already need to have 3 index levels to 
benefit from this, and I don't think this is the typical use case? 
   So I would personally leave it as is, but can certainly also easily add the 
suggestion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Fix buggy Parquet roundtrip when an index name is the same as a 
> column name
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-1754
>                 URL: https://issues.apache.org/jira/browse/ARROW-1754
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Wes McKinney
>            Assignee: Phillip Cloud
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> See upstream report 
> https://stackoverflow.com/questions/47013052/issue-with-pyarrow-when-loading-parquet-file-where-index-has-redundant-column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1754) [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name

Reply via email to