andygrove commented on PR #2662:
URL: 
https://github.com/apache/datafusion-comet/pull/2662#issuecomment-3482054526

   > > @EmilyMatt Do you have any tips for finding a good repro for the GC 
pressure issue? I am trying to reproduce this locally so that I can demonstrate 
the benefit.
   > 
   > Unfortunately I was also unable to reproduce this locally. The images I 
sent previously were saved on my machine from a while back^^ I do have the 
following pointers:
   > 
   > 1. Use multiple sequential scan operators with something that ends with a 
loop that consumes fully (I.e., IcebergCompat -> Union -> Shuffle Write)
   > 2. Use a lot of data with a lot of RAM, but few CPU cores.
   > 3. Use an unbounded memory pool, I think this issue is more prevalent 
without spilling, so the operators will accumulate a lot of data without 
returning.
   
   Thanks @EmilyMatt.
   
   Yes, with the unified pool, we will spill to disk and that will release the 
JVM wrapper objects, so maybe this is not an issue now. Thanks for helping me 
understand the issue. This has resulted in improved documentation in the 
contributor guide that explains this issue. 
   
   https://datafusion.apache.org/comet/contributor-guide/ffi.html
   
   I will close this PR and will close issue 
https://github.com/apache/datafusion-comet/issues/2661 but feel free to reopen 
if this is still an issue for you.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to