yli1994 commented on issue #14726:
URL: https://github.com/apache/arrow/issues/14726#issuecomment-1334688842

   > If you want to reduce memory usage when reading a file, you should not 
read it as an entire table, but as a sequence of batches. See here: 
https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing
   
   Thank you for your reply! I am confused how could Huggingface's datasets 
library (which uses pyarrow as backend and parquet as file format) load data 
without increasing memory consumption


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to