[
https://issues.apache.org/jira/browse/ARROW-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285288#comment-16285288
]
Joris Van den Bossche commented on ARROW-1908:
----------------------------------------------
There is still the general question that [~cpcloud] raised about allowing
duplicate column names or not. Not sure if that is a discussion that should be
held.
Eg when you manually create such a table, it fails to convert to pandas:
{code}
In [19]: table = pa.Table.from_arrays([pa.array([1, 2]), pa.array([.1, .2])],
names=['a', 'a'])
In [20]: table
Out[20]:
pyarrow.Table
a: int64
a: double
In [21]: table.to_pandas()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-33874de08f1b> in <module>()
----> 1 table.to_pandas()
table.pxi in pyarrow.lib.Table.to_pandas()
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
table_to_blockmanager(options, table, memory_pool, nthreads, categoricals)
518
519 # ARROW-1751: flatten a single level column MultiIndex for pandas
0.21.0
--> 520 columns = _flatten_single_level_multiindex(columns)
521
522 axes = [columns, index]
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in
_flatten_single_level_multiindex(index)
580 # Cheaply check that we do not somehow have duplicate column
names
581 if not index.is_unique:
--> 582 raise ValueError('Found non-unique column index')
583
584 return pd.Index([levels[_label] if _label != -1 else None
ValueError: Found non-unique column index
{code}
> [Python] Construction of arrow table from pandas DataFrame with duplicate
> column names crashes
> ----------------------------------------------------------------------------------------------
>
> Key: ARROW-1908
> URL: https://issues.apache.org/jira/browse/ARROW-1908
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Phillip Cloud
> Assignee: Phillip Cloud
> Labels: pandas, pull-request-available, python
> Fix For: 0.8.0
>
>
> [~jorisvandenbossche]'s example here:
> https://github.com/pandas-dev/pandas/pull/18201#issuecomment-350259248 shows
> that a {{pyarrow.Table}} with duplicate column names can be constructed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)