pitrou commented on issue #45287:
URL: https://github.com/apache/arrow/issues/45287#issuecomment-2612692963

   > * **After the read is done (i.e. the Jupyter notebook cell running the 
read completes):** the memory usage still hasn't decreased
   
   Ok, I don't know how Jupyter works in that regard, but I know that the 
IPython console (in the command line) keeps past results alive by default. See 
[%reset](https://ipythonbook.com/magic/reset.html).
   
   > I ran the repro on your patch commit as well 
(https://github.com/apache/arrow/issues/37630) and memory usage is a quarter of 
what it was without the patch!
   
   Great, thank you!
   
   > However, I think we're still left with the general issue that memory usage 
is significantly higher than the amount of "real data" loaded (GBs of memory 
usage for MBs of real data)-- it seems like something is still accumulating?
   
   That might also have to do with how memory allocators work (they often keep 
some cache of deallocated memory for better performance instead of returning it 
to the OS). There are several things that you could try and report results for:
   * selecting different [memory pool 
implementations](https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_DEFAULT_MEMORY_POOL):
 jemalloc, mimalloc, system
   * trying to [release memory more 
forcibly](https://arrow.apache.org/docs/cpp/api/memory.html#_CPPv4N5arrow10MemoryPool13ReleaseUnusedEv):
 this is not recommended in production cases (because this makes later 
allocations more expensive), but can be used for experiments like this to find 
out the possible cause of memory consumption
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to