gabotechs commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3742644471
> It might make sense to not have a buffer limit per partition, but on a BufferExec level? E.g. a 16MB limit instead of 16 * 1MB That would be easy to do, however, I fear that it can very easily end up in deadlocks. For example, if partition 0 exhausts all the memory budget, polling any other partition will block until someone pulls something out of partition 0, which might never happen as whoever could potentially poll partition 0 is to busy deadlocked on partition 1. A more health behavior IMO would be to have a memory budget per-partition and just put the limit lower: rather than having a global 10Mb, just have a per-partition 1Mb limit. > I am also wondering in the presence of limits / early exits, more "eager" evaluation might do some high amount of work loading probe sides not needed in the end, perhaps we can detect this and not apply BufferExec in those cases? Or just load a minimal amount based on the limit? 🤔 that's interesting, we do might be able react appropriately to `with_fetch()` in `BufferExec` in order to buffer at most `fetch` rows as an earlier limit to the configured memory budget. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
