alamb commented on code in PR #11627: URL: https://github.com/apache/datafusion/pull/11627#discussion_r1694253923
########## datafusion/physical-plan/src/aggregates/row_hash.rs: ########## @@ -484,6 +612,12 @@ impl Stream for GroupedHashAggregateStream { ( if self.input_done { ExecutionState::Done + } else if self Review Comment: nit is that putting this into a function (like `self.should_skip_aggregation()`) would make this logic easier to follow ########## datafusion/physical-plan/src/aggregates/row_hash.rs: ########## @@ -90,6 +94,69 @@ struct SpillState { merging_group_by: PhysicalGroupBy, } +struct SkipAggregationProbe { Review Comment: FYI @kazuyukitanimura -- I wonder if you have time to review this change to hash aggregate spilling as you originally contributed #7400 Context: * Describes this issue: https://github.com/apache/datafusion/issues/6937 * Additional background documentation: https://github.com/apache/datafusion/pull/11695 ########## datafusion/physical-expr-common/src/aggregate/groups_accumulator/prim_op.rs: ########## @@ -134,6 +134,46 @@ where self.update_batch(values, group_indices, opt_filter, total_num_groups) } + fn convert_to_state( + &self, + values: &[ArrayRef], + opt_filter: Option<&BooleanArray>, + ) -> Result<Vec<ArrayRef>> { + let values = values[0].as_primitive::<T>(); + let mut state = PrimitiveBuilder::<T>::with_capacity(values.len()) + .with_data_type(self.data_type.clone()); + + match opt_filter { + Some(filter) => { + values Review Comment: I think @Dandandan is suggesting using https://docs.rs/arrow/latest/arrow/compute/kernels/filter/fn.filter.html However, that would likely require a second copy of the values (apply/collect filter result and then apply `prim_fn`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org