waynexia commented on code in PR #10765:
URL: https://github.com/apache/datafusion/pull/10765#discussion_r1630529314


##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1545,8 +1545,13 @@ mod tests {
             input_schema,
         )?);
 
-        let result =
-            common::collect(merged_aggregate.execute(0, 
task_ctx.clone())?).await?;
+        // enlarge memory limit in spill mode
+        let task_ctx = if spill {
+            new_spill_ctx(2, 2600)

Review Comment:
   This change has two stages to my understanding. The first partial aggr stage 
has a smaller limit, to meet the expectation that the partial plan has early 
emit.
   
   
https://github.com/apache/datafusion/blob/6846513616608fa34e2aa92495df9f879bbd063a/datafusion/physical-plan/src/aggregates/mod.rs#L1504-L1516
   
   And the new lines enlarge the limit after that "early emit" check. Otherwise 
the merge aggr would fall because of insufficient memory. (at line 1554)
   
   ```rust
   let result = common::collect(merged_aggregate.execute(0, task_ctx)?).await?;
   ```
   
   The root cause of this change is somehow the memory requirement becomes 
larger. From my investigation, the biggest change compared to the previous is 
from the `RawTable` in `GroupValuesPrimitive` -- it needs about 1000 bytes more:
   
https://github.com/apache/datafusion/blob/6846513616608fa34e2aa92495df9f879bbd063a/datafusion/physical-plan/src/aggregates/group_values/primitive.rs#L81-L95



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to