2010YOUY01 commented on PR #23233: URL: https://github.com/apache/datafusion/pull/23233#issuecomment-4862144163
> The only thing I found (with codex) is that `PartialReduceHashAggregateStream` is also used regardless of the memory pool sizes even though it doesn't implement memory reservations / OOM behavior yet > > So I suggest we disable this special stream when there is a memory limit -- similarly to Partial hash stream guard: Thanks for the review. I agree we should not use the new path if there is the memory limit. One thing I noticed is that the original PR added spilling and streaming merge support for PartialReduce aggregation, but I don’t see any tests for the PartialReduce + memory limit case, or much discussion about the expected OOM behavior. * https://github.com/apache/datafusion/pull/20019 For this use case, I think early emit is probably both more efficient and simpler than supporting spilling / streaming merge. Should we make this path explicitly error on OOM instead? That would keep the behavior more constrained for now. Since the existing feature does not seem to have test coverage, relying on the current implementation feels a bit risky, and we can reopen the discussion later if users depend on this behavior. 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
