[jira] [Commented] (ARROW-1908) [Python] Construction of arrow table from pandas DataFrame with duplicate column names crashes

Joris Van den Bossche (JIRA) Sun, 10 Dec 2017 07:39:40 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285288#comment-16285288
 ]


Joris Van den Bossche commented on ARROW-1908:
----------------------------------------------

There is still the general question that [~cpcloud] raised about allowing 
duplicate column names or not. Not sure if that is a discussion that should be 
held.

Eg when you manually create such a table, it fails to convert to pandas:


{code}
In [19]: table = pa.Table.from_arrays([pa.array([1, 2]), pa.array([.1, .2])], 
names=['a', 'a'])

In [20]: table
Out[20]: 
pyarrow.Table
a: int64
a: double

In [21]: table.to_pandas()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-33874de08f1b> in <module>()
----> 1 table.to_pandas()

table.pxi in pyarrow.lib.Table.to_pandas()

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
table_to_blockmanager(options, table, memory_pool, nthreads, categoricals)
    518 
    519     # ARROW-1751: flatten a single level column MultiIndex for pandas 
0.21.0
--> 520     columns = _flatten_single_level_multiindex(columns)
    521 
    522     axes = [columns, index]

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
_flatten_single_level_multiindex(index)
    580         # Cheaply check that we do not somehow have duplicate column 
names
    581         if not index.is_unique:
--> 582             raise ValueError('Found non-unique column index')
    583 
    584         return pd.Index([levels[_label] if _label != -1 else None

ValueError: Found non-unique column index

{code}


> [Python] Construction of arrow table from pandas DataFrame with duplicate 
> column names crashes
> ----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1908
>                 URL: https://issues.apache.org/jira/browse/ARROW-1908
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Phillip Cloud
>            Assignee: Phillip Cloud
>              Labels: pandas, pull-request-available, python
>             Fix For: 0.8.0
>
>
> [~jorisvandenbossche]'s example here: 
> https://github.com/pandas-dev/pandas/pull/18201#issuecomment-350259248 shows 
> that a {{pyarrow.Table}} with duplicate column names can be constructed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1908) [Python] Construction of arrow table from pandas DataFrame with duplicate column names crashes

Reply via email to