[GitHub] [arrow] lnicola commented on issue #7338: [Python] DataSet uses too much memory when filtering

GitBox Wed, 03 Jun 2020 09:48:09 -0700


lnicola commented on issue #7338:
URL: https://github.com/apache/arrow/issues/7338#issuecomment-638319674



   I guess it doesn't say that filtering _doesn't_ load the whole file, but:
   
   > The `pyarrow.dataset` module provides functionality to efficiently work 
with tabular, potentially larger than memory and multi-file datasets:
   
   But `filter` takes an expression and it would seem possible to implement 
without loading the full table (e.g. using `dataset.scan`), so it's quite 
surprising that it works like this.
   
   Anyway, we can close this if it's by design, or we can leave it open if it's 
simply something not implemented yet.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lnicola commented on issue #7338: [Python] DataSet uses too much memory when filtering

Reply via email to