pepijnve commented on code in PR #19287:
URL: https://github.com/apache/datafusion/pull/19287#discussion_r2625140889
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -550,26 +569,37 @@ impl GroupedHashAggregateStream {
.collect::<Vec<_>>()
.join(", ");
let name = format!("GroupedHashAggregateStream[{partition}]
({agg_fn_names})");
- let reservation = MemoryConsumer::new(name)
- .with_can_spill(true)
- .register(context.memory_pool());
let group_ordering = GroupOrdering::try_new(&agg.input_order_mode)?;
+ let oom_mode = match group_ordering {
+ GroupOrdering::None => {
+ if agg.mode == AggregateMode::Partial {
+ OutOfMemoryMode::EmitEarly
+ } else {
+ OutOfMemoryMode::Spill
+ }
+ }
+ GroupOrdering::Partial(_) | GroupOrdering::Full(_) =>
OutOfMemoryMode::Spill,
Review Comment:
After thinking and replying to the comment on `can_emit` I realised that the
logic there actually allows for an overshoot of the memory reservation when
we're reading back the spilled data for `GroupOrdering::Full`. That doesn't
really make sense to first spill to disk because of OOM, and then allow an
overshoot on readback. Might as well just allow the overshoot initially then
and not spill at all.
So the question then is what the behaviour should be on OOM for full? Ignore
and allow the overshoot or report and fail the query?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]