mustafasrepo commented on code in PR #6904:
URL: https://github.com/apache/arrow-datafusion/pull/6904#discussion_r1259861551
##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -306,460 +370,194 @@ impl RecordBatchStream for GroupedHashAggregateStream {
}
impl GroupedHashAggregateStream {
- // Update the row_aggr_state according to groub_by values (result of
group_by_expressions)
+ /// Calculates the group indicies for each input row of
+ /// `group_values`.
+ ///
+ /// At the return of this function,
+ /// `self.scratch_space.current_group_indices` has the same number
+ /// of entries as each array in `group_values` and holds the
+ /// correct group_index for that row.
+ ///
+ /// This is one of the core hot loops in the algorithm
fn update_group_state(
&mut self,
group_values: &[ArrayRef],
allocated: &mut usize,
- ) -> Result<Vec<usize>> {
+ ) -> Result<()> {
Review Comment:
It seems to me that this function can return `Result<ScratchSpace>`. With
this change. We can remove `scratch_space` from `GroupedHashAggregateStream`
state. However, maybe keeping it in state has some benefits, I am not sure
about this change. Just saying it, incase it seems better to you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]