[GitHub] [arrow] westonpace commented on issue #35569: python - read multiple parquets that have different schema?

via GitHub Mon, 15 May 2023 11:35:45 -0700


westonpace commented on issue #35569:
URL: https://github.com/apache/arrow/issues/35569#issuecomment-1548361801


   Which version of pyarrow is this?  Any schema evolution is going to be 
provided by the new datasets feature (`pyarrow.dataset`) and probably not added 
to `parquet.ParquetDataset`.
   
   Do you get an error with:
   
   ```
   pyarrow.dataset.dataset("bucket/folder", filesystem=s3_src, 
partitioning="hive")
   ```
   
   Is it possible to manually specify a schema that includes all of the fields 
(even if some of those fields are missing in some files)?  Do you still get 
this error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #35569: python - read multiple parquets that have different schema?

Reply via email to