Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2323376253
Hi @alamb main comments https://github.com/apache/datafusion/pull/11943#pullrequestreview-2254643945 for this prhave been fixed, minding have a quick look? It would be appreciated. The detail about main progress: - Added a phase for blocked approach in GroupAggregator document https://github.com/Rachelint/arrow-datafusion/blob/b7a443a16420a5fb400297d05ee59c2122c9de6f/datafusion/physical-plan/src/aggregates/row_hash.rs#L346 - Added physical plan level unit test and fuzzy test for blocked approach, added unit tests for new introduced structs and functions. - Added an option `enable_aggregation_intermediate_states_blocked_approach` for this optimization https://github.com/Rachelint/arrow-datafusion/blob/a2d81a59cac598dbf0d7a237d5d7b71b7149a82a/datafusion/common/src/config.rs#L356C13-L356C68 - Remove the prevoius two mode (flat and blocked), like https://github.com/Rachelint/arrow-datafusion/blob/a2d81a59cac598dbf0d7a237d5d7b71b7149a82a/datafusion/functions-aggregate/src/count.rs#L379 Things planned in next prs: - Improve the impl, now the `BlockedGroupIndex` will be computed multiple times... I want to avoid it, but it will lead to api change, and not a trivial ting... The change may be like: ```Rust // current fn update_batch( &mut self, values: &[ArrayRef], group_indices: &[usize], opt_filter: Option<&BooleanArray>, total_num_groups: usize, ) -> Result<()>; // new fn update_batch( &mut self, values: &[ArrayRef], group_indices: &[BlockedGroupIndex], opt_filter: Option<&BooleanArray>, total_num_groups: usize, ) -> Result<()>; ``` - More complete fuzz tests - Support blocked apporach in more `GroupValues`s and `GroupsAccumulator`s. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org