pitrou commented on issue #44599:
URL: https://github.com/apache/arrow/issues/44599#issuecomment-2479409424

   Ok, I think we're talking about two separate issues here:
   1. the memory consumption when a Parquet metadata file is loaded (the 2 GB 
in your reproducer)
   2. the fact that said memory consumption seems to remain persistent even 
when memory is explicitly released
   
   As I said, I think the second issue is unrelated to PyArrow, and you can 
probably mitigate it either by changing the default memory pool (e.g. switch to 
"system" if it doesn't reduce performance for you) or try to tune memory 
allocator options (for example by experimenting with the mimalloc environment 
variables I linked to).
   
   However, the first issue may perhaps deserve improving in PyArrow. For that, 
it would be nice if you could compare the memory consumption with PyArrow and 
with parquet-rs. Also, hopefully you can upload a reproducer file for us to 
look at?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to