lnicola commented on issue #7338: URL: https://github.com/apache/arrow/issues/7338#issuecomment-638319674
I guess it doesn't say that filtering _doesn't_ load the whole file, but: > The `pyarrow.dataset` module provides functionality to efficiently work with tabular, potentially larger than memory and multi-file datasets: But `filter` takes an expression and it would seem possible to implement without loading the full table (e.g. using `dataset.scan`), so it's quite surprising that it works like this. Anyway, we can close this if it's by design, or we can leave it open if it's simply something not implemented yet. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
