pitrou commented on PR #13938:
URL: https://github.com/apache/arrow/pull/13938#issuecomment-1222530533

   So, it seems this is a capability that should be preserved. The problem is 
the new dataset implementation doesn't allow reading the file back:
   ```python
   >>> pq.read_table('file.parquet', use_legacy_dataset=False)
   Traceback (most recent call last):
     [...]
   ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: int64
   a: int64
   __fragment_index: int32
   __batch_index: int32
   __last_in_fragment: bool
   __filename: string
   
   >>> pq.read_table('file.parquet', use_legacy_dataset=True)
   <ipython-input-12-6eeebe64658f>:1: FutureWarning: Passing 
'use_legacy_dataset=True' to get the legacy behaviour is deprecated as of 
pyarrow 8.0.0, and the legacy implementation will be removed in a future 
version.
     pq.read_table('file.parquet', use_legacy_dataset=True)
   pyarrow.Table
   a: int64
   a: int64
   ----
   a: [[4,5,6]]
   a: [[1,2,3]]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to