Rachelint commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r2075154369


##########
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs:
##########
@@ -93,20 +94,27 @@ where
         opt_filter: Option<&BooleanArray>,
         total_num_groups: usize,
     ) -> Result<()> {

Review Comment:
   > yes, i mean block size. I would expect something like batchsize (4-8k), or 
maybe even bigger to have lower overhead? Did you run some experiments?
   
   Yes, I try it.
   
   Now I set `block_size = batch_size`. 
   
   I try the smaller `batch_size` like 1024, and `this pr` show improvement 
compared to `main`.
   
   It is due to `this pr` can also eliminating the call of `Array::slice`, 
which is non-trivial due to the computation of null count. Detail can see: 
https://github.com/apache/arrow-rs/pull/6155



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to