waynexia commented on code in PR #10765: URL: https://github.com/apache/datafusion/pull/10765#discussion_r1630529314
########## datafusion/physical-plan/src/aggregates/mod.rs: ########## @@ -1545,8 +1545,13 @@ mod tests { input_schema, )?); - let result = - common::collect(merged_aggregate.execute(0, task_ctx.clone())?).await?; + // enlarge memory limit in spill mode + let task_ctx = if spill { + new_spill_ctx(2, 2600) Review Comment: This change has two stages to my understanding. The first partial aggr stage has a smaller limit, to meet the expectation that the partial plan has early emit. https://github.com/apache/datafusion/blob/6846513616608fa34e2aa92495df9f879bbd063a/datafusion/physical-plan/src/aggregates/mod.rs#L1504-L1516 And the new lines enlarge the limit after that "early emit" check. Otherwise the merge aggr would fall because of insufficient memory. (at line 1554) ```rust let result = common::collect(merged_aggregate.execute(0, task_ctx)?).await?; ``` The root cause of this change is somehow the memory requirement becomes larger. From my investigation, the biggest change compared to the previous is from the `RawTable` in `GroupValuesPrimitive` -- it needs about 1000 bytes more: https://github.com/apache/datafusion/blob/6846513616608fa34e2aa92495df9f879bbd063a/datafusion/physical-plan/src/aggregates/group_values/primitive.rs#L81-L95 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org