wyzhao opened a new pull request #8672: URL: https://github.com/apache/arrow/pull/8672
I would like to enhance partition filters in methods such as: pyarrow.parquet.ParquetDataset(path, filters) I am proposing the below enhancements: 1. for operator "in", "not in", the value should be any typing.Iteratable (also a container). But currently only set is supported while other iteratable, such as list, tuple cannot function correctly. I would like to change it to accept any iteratable. 2. Enhance the documents about the partition filters. 3. Check when no partition can satisfy the filters, raise an exception with meaningful error message. I see there is a new version implemented with _ParquetDatasetV2 which passed my tests with an iterable for "in" and "not in". So the documentation update is fine for the new version as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
