pitrou commented on issue #44599: URL: https://github.com/apache/arrow/issues/44599#issuecomment-2479409424
Ok, I think we're talking about two separate issues here: 1. the memory consumption when a Parquet metadata file is loaded (the 2 GB in your reproducer) 2. the fact that said memory consumption seems to remain persistent even when memory is explicitly released As I said, I think the second issue is unrelated to PyArrow, and you can probably mitigate it either by changing the default memory pool (e.g. switch to "system" if it doesn't reduce performance for you) or try to tune memory allocator options (for example by experimenting with the mimalloc environment variables I linked to). However, the first issue may perhaps deserve improving in PyArrow. For that, it would be nice if you could compare the memory consumption with PyArrow and with parquet-rs. Also, hopefully you can upload a reproducer file for us to look at? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
