adriangb commented on issue #17334:
URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3470383804

   In the meantime, I'm thinking a bit about the idea of "reclaiming space". 
Something pretty easy would be if spoilable operators called some 
`MemoryPool::memory_exceeded(&self) -> usize` API that gives back the how far 
over the soft limit we are and then the operator can decide if it wants to 
spill to help reclaim memory. In the simplest version the operator itself would 
then spill whatever it wants (all of its data or just some of it). I imagine 
some operators might have to do a very expensive "switch over completely to a 
different algorithm and spill all of our data" (sorts?) while others might be 
able to say "I'm going to spill ~ the amount we are over" or "I'm going to 
spill some subset of my data". E.g. having recently worked on RepartitionExec 
it would be easy and cheap for that operator to spill some of the data it has 
in memory but not necessarily all of it. A more advanced version would have 
some sort of priority system that tries to match up how much memory needs to be 
re
 claimed with operators (maybe more than one) that can reclaim at least that 
much memory cheaply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to