milesgranger opened a new pull request, #13938:
URL: https://github.com/apache/arrow/pull/13938

   Without this patch, the following is possible:
   
   ```python
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   t = pa.Table.from_pydict({'a': [1,2,3]})
   t = t.add_column(0, 'a', pa.array([4, 5, 6]))  # Adding column with same 
field name
   
   pq.write_table(t, 'file.parquet')  # OK
   pq.read_table('file.parquet')  # Error
   ...
   ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: int64
   a: int64
   __fragment_index: int32
   __batch_index: int32
   __last_in_fragment: bool
   __filename: string
   ```
   
   This patch will prevent `pq.write_table(...)` from writing a table with 
duplicate field names:
   ```python
   t.write_table(t, 'file.parquet')
   ...
   ArrowInvalid: Cannot write parquet table with duplicate field names: a
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to