Grayden Shand created ARROW-17388:
-------------------------------------

             Summary: Prevent corrupting files with Multiple matches for 
FieldRef.Name
                 Key: ARROW-17388
                 URL: https://issues.apache.org/jira/browse/ARROW-17388
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
         Environment: MacOS, Python 3.10.3
            Reporter: Grayden Shand


{*}Version{*}: pyarrow 9.0.0

 

*Description*

Users can add a column with the the same name as an existing column to a table 
via `pyarrow.Table.add_column()`.

 

Additionally, that table can be written to a parquet file with 
`pyarrow.parquet.write_table()`.

 

However, the written file cannot be read with `pyarrow.parquet.read_table()` 
due to having multiple columns with the same name.

 

Flagging this as a bug because I believe anything that is successfully written 
by `write_table()` should be readable by `read_table()`.

 

*Minimum reproducible example*

```

>>> import pyarrow.parquet as pq
>>> import pyarrow as pa
>>> t = pa.Table.from_pydict(\{'a': [1,2,3]})
>>> pq.write_table(t.add_column(0, 'a', pa.array([1.1,2.2,3.3])), 
>>> 'test.parquet')
>>> pq.read_table('test.parquet')
pyarrow.lib.ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: double
a: int64
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to