andygrove commented on PR #2662: URL: https://github.com/apache/datafusion-comet/pull/2662#issuecomment-3482054526
> > @EmilyMatt Do you have any tips for finding a good repro for the GC pressure issue? I am trying to reproduce this locally so that I can demonstrate the benefit. > > Unfortunately I was also unable to reproduce this locally. The images I sent previously were saved on my machine from a while back^^ I do have the following pointers: > > 1. Use multiple sequential scan operators with something that ends with a loop that consumes fully (I.e., IcebergCompat -> Union -> Shuffle Write) > 2. Use a lot of data with a lot of RAM, but few CPU cores. > 3. Use an unbounded memory pool, I think this issue is more prevalent without spilling, so the operators will accumulate a lot of data without returning. Thanks @EmilyMatt. Yes, with the unified pool, we will spill to disk and that will release the JVM wrapper objects, so maybe this is not an issue now. Thanks for helping me understand the issue. This has resulted in improved documentation in the contributor guide that explains this issue. https://datafusion.apache.org/comet/contributor-guide/ffi.html I will close this PR and will close issue https://github.com/apache/datafusion-comet/issues/2661 but feel free to reopen if this is still an issue for you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
