amol- commented on PR #13409: URL: https://github.com/apache/arrow/pull/13409#issuecomment-1341003172
@jorisvandenbossche I checked for `ParquetDataset` and the experience if fairly confusing from the end user point of view. If the dataset is created using `ds.parquet_dataset` it will have the filter capabilities, but if it's created using `pyarrow.parquet.ParquetDataset` it won't have filtering capabilities. But `ParquetDataset` in its V2 implementation is just a proxy to `ds.Dataset`, so it could in theory gain filtering support. It seems that `ParquetDataset` is mostly a duplicate of what `ds.parquet_dataset` can do when `use_legacy_dataset=False`, so is there a reason why we keep it around? Is there a plan to deprecate it in the future? Asking because if the plan is to deprecate it some day, then it probably doesn't make much same to invest the effort to work toward feature parity with `ds.Dataset` and we can consider this task done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
