crepererum commented on code in PR #4202:
URL: https://github.com/apache/arrow-datafusion/pull/4202#discussion_r1023844082


##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -270,10 +315,25 @@ fn group_aggregate_batch(
                 // Existing entry for this group value
                 Some((_hash, group_idx)) => {
                     let group_state = &mut group_states[*group_idx];
+
                     // 1.3
                     if group_state.indices.is_empty() {
                         groups_with_rows.push(*group_idx);
                     };
+
+                    // ensure we have enough indices allocated
+                    if group_state.indices.capacity() == 
group_state.indices.len() {
+                        // allocate more
+
+                        // growth factor: 2, but at least 2 elements
+                        let bump_elements = (group_state.indices.capacity() * 
2).max(2);

Review Comment:
   > I wonder if we could somehow encapsulate the memory manager interactions 
into functions on `GroupAggrState` rather than treating it like a struct.
   
   That only works if all interactions with `GroupState` go throw methods, not 
only a few of them due to how Rust handles borrowing (= `fn f(&self)` and `fn 
f(&mut self)` borrow the whole struct, so you cannot mutable borrow any member 
at the same time).
   
   > What if `group_aggregate_batch` returns how much more memory it allocated, 
and accounting is done after method call? ... Also, this way end of the batch 
would be a "safe point" at which we could trigger spill
   
   Fair, let me try that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to