2010YOUY01 commented on issue #17334:
URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3233336093

   The current `FairSpillPool` implementation seems problematic
   See its behavior: 
https://github.com/apache/datafusion/blob/5021b397b1e63277b217dd3f8111b64b3458d484/datafusion/execution/src/memory_pool/pool.rs#L202
   I think some specific timing can cause the non-spillable operators to OOM, 
however if spillable operators can spill earlier, the execution should be 
possible to finish. (@ding-young Just want to make sure, is such race condition 
possible? 🤔 )
   
   > - implement custom MemoryPool to special-case how different operators are 
tracked
   > 
   > Ideally I'd prefer a solution where we can precisely control how much 
memory is allocated to the sort compared to the rest of the operators, 
especially given the subtle memory allocations we faced during the sort merge 
phase - the solution "fake spillable allocation" is a bit coarse-grain with 
that regards.
   
   This fine-grained memory poll approach sounds good. If we put all 
non-spillable consumers (like `RepartitionExec`) to one pool, and all spillable 
consumers (like `SortExec`) to another pool, such race condition won't be 
possible. 
   
   Besides I think maybe it's okay to remove memory reservations in 
`RepartitionExec`, it's memory usage should be bounded by `batch_mem_size * 
partition_count`, which is a constant.
   
   ----
   BTW spilling execution is still in experimental stage, we're working on 
bug-fixes to make it production ready. If you can provide a runnable 
reproducer, we can help further investigate.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to