cetra3 opened a new issue, #20715: URL: https://github.com/apache/datafusion/issues/20715
### Is your feature request related to a problem or challenge? Right now in Datafusion there are bugs when you are sorting & spilling with multiple partitions. There is even a comment that on the multi level merge that this is not a great solution: https://github.com/apache/datafusion/blob/d412ba5d182a7ab44792647de33f7fb2b58efc27/datafusion/physical-plan/src/sorts/multi_level_merge.rs#L361-L362 There is also some contention between *when* we should spill and when we are out of memory. Both of these are essentially two limits on the memory cap. After we fix memory accounting via solutions in https://github.com/apache/datafusion/issues/20714, we should be able to adjust the memory pool implementation to allow for more memory coordination. ### Describe the solution you'd like Implement a new "memory coordinator" infrastructure which has the following properties: * Partition Aware Scheduling: if there are multiple partitions across one or more active queries, we should ensure that we fairly allocate memory according to partitions * High and Low memory watermarks: we should have two thresholds for our memory limits. The first is the low watermark: when things should *start* to spill, and high watermark: when we should error out because we are at max memory and there is no way to continue without blowing out memory allocations * Ability to "wait" for memory to be free: rather than error out immediately if there is no memory, have the ability to wait until memory is available. This would mean that we would need some sort of method to wait until memory is available. However: this is deadlock prone so we need to be careful, or have it as an opt-in. I.e, have the default behaviour to error out immediately, but allow users of datafusion the ability to wait around for a bit I have experimented with all of these style properties in this branch: https://github.com/apache/datafusion/compare/main...pydantic:datafusion:memory_observations but it was vibe coded to prove the concept (which it does), but probably needs to be broken out into discrete chunks of work and more thoroughly thought out. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
