adriangb commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3470383804
In the meantime, I'm thinking a bit about the idea of "reclaiming space". Something pretty easy would be if spoilable operators called some `MemoryPool::memory_exceeded(&self) -> usize` API that gives back the how far over the soft limit we are and then the operator can decide if it wants to spill to help reclaim memory. In the simplest version the operator itself would then spill whatever it wants (all of its data or just some of it). I imagine some operators might have to do a very expensive "switch over completely to a different algorithm and spill all of our data" (sorts?) while others might be able to say "I'm going to spill ~ the amount we are over" or "I'm going to spill some subset of my data". E.g. having recently worked on RepartitionExec it would be easy and cheap for that operator to spill some of the data it has in memory but not necessarily all of it. A more advanced version would have some sort of priority system that tries to match up how much memory needs to be re claimed with operators (maybe more than one) that can reclaim at least that much memory cheaply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
