Rachelint commented on PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2323376253

   Hi @alamb main comments 
https://github.com/apache/datafusion/pull/11943#pullrequestreview-2254643945 
for this prhave been fixed,  minding have a quick look? It would be appreciated.
   
   The detail about main progress:
   - Added a phase for blocked approach in GroupAggregator document 
https://github.com/Rachelint/arrow-datafusion/blob/b7a443a16420a5fb400297d05ee59c2122c9de6f/datafusion/physical-plan/src/aggregates/row_hash.rs#L346
   - Added physical plan level unit test and fuzzy test for blocked approach, 
added unit tests for new introduced structs and functions.
   - Added an option `enable_aggregation_intermediate_states_blocked_approach` 
for this optimization 
https://github.com/Rachelint/arrow-datafusion/blob/a2d81a59cac598dbf0d7a237d5d7b71b7149a82a/datafusion/common/src/config.rs#L356C13-L356C68
   - Remove the prevoius two mode (flat and blocked), like 
https://github.com/Rachelint/arrow-datafusion/blob/a2d81a59cac598dbf0d7a237d5d7b71b7149a82a/datafusion/functions-aggregate/src/count.rs#L379
   
   Things planned in next prs:
   - Improve the impl, now the `BlockedGroupIndex` will be computed multiple 
times... I want to avoid it, but it will lead to api change, and not a trivial 
ting... The change may be like:
   ```Rust
   // current
       fn update_batch(
           &mut self,
           values: &[ArrayRef],
           group_indices: &[usize],
           opt_filter: Option<&BooleanArray>,
           total_num_groups: usize,
       ) -> Result<()>;
   
   // new
       fn update_batch(
           &mut self,
           values: &[ArrayRef],
           group_indices: &[BlockedGroupIndex],
           opt_filter: Option<&BooleanArray>,
           total_num_groups: usize,
       ) -> Result<()>;
   
   ```
   - More complete fuzz tests
   - Support blocked apporach in more `GroupValues`s and `GroupsAccumulator`s.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to