dbr commented on issue #623: URL: https://github.com/apache/arrow-rs/issues/623#issuecomment-888788592
I ran the example code with the [`memory-profiler`](https://github.com/koute/memory-profiler) tool, with huge batch size:  If I understand right, this explains the remaining mystery (why giant batch size causes process to use lots of memory). In my "ELI5 level" knowledge of memory allocators: - With a huge chunk size size, the readers allocate large continuous chunks of memory as part of the parsing - The RecordBatch(es) are allocated while the parsing is happening, so they have to "fit around" these allocations - When the buffers for the reader are deallocated, they leave big "holes" in the allocated memory - Since Rust wont magically rearrange items in memory, the process ends up using all that memory, holes and all - The example script is especially bad since there's no subsequent allocations which might use up some of those holes - With a smaller buffer size, the "holes" created by the parser are much smaller, thus the overhead is insignificant. I might try and make a PR to add some basic docs to the `with_batch_size` methods when I have time, to incorporate some of the advice above - but otherwise I think this issue can be closed, as it seems to be working "as intended". Thanks @Dandandan ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
