[GitHub] [arrow] pitrou commented on pull request #13938: ARROW-17388: [C++][Python] Error on WriteTable if duplicate field names

GitBox Mon, 22 Aug 2022 08:37:15 -0700


pitrou commented on PR #13938:
URL: https://github.com/apache/arrow/pull/13938#issuecomment-1222530533


   So, it seems this is a capability that should be preserved. The problem is 
the new dataset implementation doesn't allow reading the file back:
   ```python
   >>> pq.read_table('file.parquet', use_legacy_dataset=False)
   Traceback (most recent call last):
     [...]
   ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: int64
   a: int64
   __fragment_index: int32
   __batch_index: int32
   __last_in_fragment: bool
   __filename: string
   
   >>> pq.read_table('file.parquet', use_legacy_dataset=True)
   <ipython-input-12-6eeebe64658f>:1: FutureWarning: Passing 
'use_legacy_dataset=True' to get the legacy behaviour is deprecated as of 
pyarrow 8.0.0, and the legacy implementation will be removed in a future 
version.
     pq.read_table('file.parquet', use_legacy_dataset=True)
   pyarrow.Table
   a: int64
   a: int64
   ----
   a: [[4,5,6]]
   a: [[1,2,3]]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #13938: ARROW-17388: [C++][Python] Error on WriteTable if duplicate field names

Reply via email to