Re: [I] [C++] Metadata related memory leak when reading parquet dataset [arrow]

via GitHub Tue, 28 Jan 2025 12:42:47 -0800


timothydijamco commented on issue #45287:
URL: https://github.com/apache/arrow/issues/45287#issuecomment-2619999594


   > > However, I think we're still left with the general issue that memory 
usage is significantly higher than the amount of "real data" loaded (GBs of 
memory usage for MBs of real data)-- it seems like something is still 
accumulating?
   > 
   > That might also have to do with how memory allocators work (they often 
keep some cache of deallocated memory for better performance instead of 
returning it to the OS). There are several things that you could try and report 
results for:
   > 
   > selecting different [memory pool 
implementations](https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_DEFAULT_MEMORY_POOL):
 jemalloc, mimalloc, system
   > 
   > trying to [release memory more 
forcibly](https://arrow.apache.org/docs/cpp/api/memory.html#_CPPv4N5arrow10MemoryPool13ReleaseUnusedEv):
 this is not recommended in production cases (because this makes later 
allocations more expensive), but can be used for experiments like this to find 
out the possible cause of memory consumption
   
   I printed out info about the default memory pool after every batch is read 
(read from the `RecordBatchReader` I created from the `Scanner`)
   * `total_bytes_allocated` steadily increases over time which makes sense
   * `bytes_allocated` fluctuates but remains capped (i.e. does not correlate 
with the overall memory usage of the process increasing steadily over time)
   
   Calling `arrow::default_memory_pool()->ReleaseUnused()`after every record 
batch is read also seems to not have an effect
   
   My shaky understanding of Arrow memory pools and allocators says this means 
the memory usage I'm hoping to reduce is some memory that is not allocated on 
the Arrow memory pool?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++] Metadata related memory leak when reading parquet dataset [arrow]

Reply via email to