mustafasrepo commented on code in PR #6904:
URL: https://github.com/apache/arrow-datafusion/pull/6904#discussion_r1259861551


##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -306,460 +370,194 @@ impl RecordBatchStream for GroupedHashAggregateStream {
 }
 
 impl GroupedHashAggregateStream {
-    // Update the row_aggr_state according to groub_by values (result of 
group_by_expressions)
+    /// Calculates the group indicies for each input row of
+    /// `group_values`.
+    ///
+    /// At the return of this function,
+    /// `self.scratch_space.current_group_indices` has the same number
+    /// of entries as each array in `group_values` and holds the
+    /// correct group_index for that row.
+    ///
+    /// This is one of the core hot loops in the algorithm
     fn update_group_state(
         &mut self,
         group_values: &[ArrayRef],
         allocated: &mut usize,
-    ) -> Result<Vec<usize>> {
+    ) -> Result<()> {

Review Comment:
   It seems to me that this function can return `Result<ScratchSpace>`. With 
this change. We can remove `scratch_space` from `GroupedHashAggregateStream` 
state. However, maybe keeping it in state has some benefits, I am not sure 
about this change. Just saying it, incase it seems better to you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to