[GitHub] [arrow] 0x26res commented on issue #32067: [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset

via GitHub Fri, 03 Feb 2023 04:36:25 -0800


0x26res commented on issue #32067:
URL: https://github.com/apache/arrow/issues/32067#issuecomment-1415814636


   Now that this change is effective in 11.0, we get this warning when loading 
data with `use_legacy_dataset=True`.
   
   ```
   FutureWarning: Passing 'use_legacy_dataset=True' to get the legacy behaviour 
is deprecated as of pyarrow 11.0.0, and the legacy implementation will be 
removed in a future version.
   ```
   
   I'm in the process of migrating to `use_legacy_dataset=False`, but was 
wondering what differences to expect between the 2 implementations. Is this 
documented somewhere?
   
   I have noticed one significant difference in behaviour. The legacy 
implementation would complain if the parquet schema are heterogeneous. The new 
implementation will try to convert all files to the schema of the first file it 
found (or the `schema` argument when provided).
   
   Are there other differences to expect?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] 0x26res commented on issue #32067: [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset

Reply via email to