crepererum commented on code in PR #4202:
URL: https://github.com/apache/arrow-datafusion/pull/4202#discussion_r1023844082
##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -270,10 +315,25 @@ fn group_aggregate_batch(
// Existing entry for this group value
Some((_hash, group_idx)) => {
let group_state = &mut group_states[*group_idx];
+
// 1.3
if group_state.indices.is_empty() {
groups_with_rows.push(*group_idx);
};
+
+ // ensure we have enough indices allocated
+ if group_state.indices.capacity() ==
group_state.indices.len() {
+ // allocate more
+
+ // growth factor: 2, but at least 2 elements
+ let bump_elements = (group_state.indices.capacity() *
2).max(2);
Review Comment:
> I wonder if we could somehow encapsulate the memory manager interactions
into functions on `GroupAggrState` rather than treating it like a struct.
That only works if all interactions with `GroupState` go throw methods, not
only a few of them due to how Rust handles borrowing (= `fn f(&self)` and `fn
f(&mut self)` borrow the whole struct, so you cannot mutable borrow any member
at the same time).
> What if `group_aggregate_batch` returns how much more memory it allocated,
and accounting is done after method call? ... Also, this way end of the batch
would be a "safe point" at which we could trigger spill
Fair, let me try that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]