[GitHub] [arrow] westonpace commented on issue #12653: Conversion from one dataset to another that will not fit in memory?

GitBox Thu, 17 Mar 2022 18:13:32 -0700


westonpace commented on issue #12653:
URL: https://github.com/apache/arrow/issues/12653#issuecomment-1071923418



   At the moment we generally use too much memory when scanning parquet.  This 
is because the scanner's readahead is unfortunately based on the row group size 
and not the batch size.  Using smaller row groups in your source files will 
help.  #12228 changes the readahead to be based on the batch size but it's been 
on my back burner for a bit.  I'm still optimistic I will get to it for the 
8.0.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #12653: Conversion from one dataset to another that will not fit in memory?

Reply via email to