Re: [PR] Implement disk spilling for all grouping ordering modes in GroupedHashAggregateStream [datafusion]

via GitHub Tue, 16 Dec 2025 16:02:06 -0800


pepijnve commented on code in PR #19287:
URL: https://github.com/apache/datafusion/pull/19287#discussion_r2625140889



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -550,26 +569,37 @@ impl GroupedHashAggregateStream {
             .collect::<Vec<_>>()
             .join(", ");
         let name = format!("GroupedHashAggregateStream[{partition}] 
({agg_fn_names})");
-        let reservation = MemoryConsumer::new(name)
-            .with_can_spill(true)
-            .register(context.memory_pool());
         let group_ordering = GroupOrdering::try_new(&agg.input_order_mode)?;
+        let oom_mode = match group_ordering {
+            GroupOrdering::None => {
+                if agg.mode == AggregateMode::Partial {
+                    OutOfMemoryMode::EmitEarly
+                } else {
+                    OutOfMemoryMode::Spill
+                }
+            }
+            GroupOrdering::Partial(_) | GroupOrdering::Full(_) => 
OutOfMemoryMode::Spill,

Review Comment:
   After thinking and replying to the comment on `can_emit` I realised that the 
logic there actually allows for an overshoot of the memory reservation when 
we're reading back the spilled data for `GroupOrdering::Full`. That doesn't 
really make sense to first spill to disk because of OOM, and then allow an 
overshoot on readback. Might as well just allow the overshoot initially then 
and not spill at all.
   
   So the question then is what the behaviour should be on OOM for full? Ignore 
and allow the overshoot or report and fail the query?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Implement disk spilling for all grouping ordering modes in GroupedHashAggregateStream [datafusion]

Reply via email to