[
https://issues.apache.org/jira/browse/ARROW-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miles Granger closed ARROW-17388.
---------------------------------
Resolution: Duplicate
> [Python] Prevent corrupting files with Multiple matches for FieldRef.Name
> -------------------------------------------------------------------------
>
> Key: ARROW-17388
> URL: https://issues.apache.org/jira/browse/ARROW-17388
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Environment: MacOS, Python 3.10.3
> Reporter: Grayden Shand
> Assignee: Miles Granger
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> {*}Version{*}: pyarrow 9.0.0
>
> *Description*
> Users can add a column with the the same name as an existing column to a
> table via `pyarrow.Table.add_column()`.
>
> Additionally, that table can be written to a parquet file with
> `pyarrow.parquet.write_table()`.
>
> However, the written file cannot be read with `pyarrow.parquet.read_table()`
> due to having multiple columns with the same name.
>
> Flagging this as a bug because I believe anything that is successfully
> written by `write_table()` should be readable by `read_table()`.
>
> *Minimum reproducible example*
> ```
> >>> import pyarrow.parquet as pq
> >>> import pyarrow as pa
> >>> t = pa.Table.from_pydict(\{'a': [1,2,3]})
> >>> pq.write_table(t.add_column(0, 'a', pa.array([1.1,2.2,3.3])),
> >>> 'test.parquet')
> >>> pq.read_table('test.parquet')
> pyarrow.lib.ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: double
> a: int64
> __fragment_index: int32
> __batch_index: int32
> __last_in_fragment: bool
> __filename: string
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)