[GitHub] [arrow] amol- commented on pull request #13409: ARROW-16616: [Python] Add lazy Dataset.filter() method

GitBox Wed, 07 Dec 2022 05:55:55 -0800


amol- commented on PR #13409:
URL: https://github.com/apache/arrow/pull/13409#issuecomment-1341003172


   @jorisvandenbossche I checked for `ParquetDataset` and the experience if 
fairly confusing from the end user point of view. If the dataset is created 
using `ds.parquet_dataset` it will have the filter capabilities, but if it's 
created using `pyarrow.parquet.ParquetDataset` it won't have filtering 
capabilities. But `ParquetDataset` in its V2 implementation is just a proxy to 
`ds.Dataset`, so it could in theory gain filtering support.
   
   It seems that `ParquetDataset` is mostly a duplicate of what 
`ds.parquet_dataset` can do when `use_legacy_dataset=False`, so is there a 
reason why we keep it around? Is there a plan to deprecate it in the future?
   
   Asking because if the plan is to deprecate it some day, then it probably 
doesn't make much same to invest the effort to work toward feature parity with 
`ds.Dataset` and we can consider this task done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- commented on pull request #13409: ARROW-16616: [Python] Add lazy Dataset.filter() method

Reply via email to