Re: [I] Spark cannot reclaim memory from native operators (spill callback returns 0) [datafusion-comet]

via GitHub Mon, 06 Apr 2026 15:38:33 -0700


nathanb9 commented on issue #3873:
URL: 
https://github.com/apache/datafusion-comet/issues/3873#issuecomment-4194877691


   Drew what this could look like starting from the point where spark memory 
pressure occurs.
   Meaning Step 1. Spark operator tries to grow and is unable to allocate and 
so spill attempt begins. The spill is called on [every consumer  
](https://wforget.github.io/apache-spark-internals/memory/TaskMemoryManager/#source-java_1)
   ```
   Spark TaskMemoryManager   CometTaskMemoryManager   "Execution Registry"    
TrackConsumersPool    Spillable Native Op    Comet Native Pool
            |                         |                        |                
      |                      |                      |
            |---- 1. spill(size) ---->|                        |                
      |                      |                      |
            |                         |-- 2. spillMemory ----->|                
      |                      |                      |
            |                         |   (execution_id,size)  |                
      |                      |                      |
            |                         |                        |-- 3. reclaim 
------->|                      |                      |
            |                         |                        |   
(size,exclude_cur) |                      |                      |
            |                         |                        |                
      |-- 4. reclaimer ----->|                      |
            |                         |                        |                
      |                      |                      |
            |                         |                        |                
      |<-- 5. free/shrink ---|                      |
            |                         |                        |                
      |-- 6. shrink ------------------------------->|
            |                         |<-- 7. reclaimed bytes -|<-- 7. 
reclaimed bytes|                      |                      |
            |<-- 8. reclaimed bytes --|                        |                
      |                      |                      |
            |<---------------------- 10. releaseExecutionMemory(...) 
-----------------|<-- 9. releaseMemory- |                      |
   ```
   
   step 2's `spillMemory` is the JNI interface
   step 4's `reclaimer` is what we would need in datafusion for this approach.
   
   
   Ill write a more detailed explanation in a PR and for now I will cut a 
ticket in datafusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Spark cannot reclaim memory from native operators (spill callback returns 0) [datafusion-comet]

Reply via email to