Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2075154369
########## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ########## @@ -93,20 +94,27 @@ where opt_filter: Option<&BooleanArray>, total_num_groups: usize, ) -> Result<()> { Review Comment: > yes, i mean block size. I would expect something like batchsize (4-8k), or maybe even bigger to have lower overhead? Did you run some experiments? Yes, I try it. Now I set `block_size = batch_size`. I try the smaller `batch_size` like 1024, and `this pr` show improvement compared to `main`. It is due to `this pr` can also eliminating the call of `Array::slice`, which is non-trivial due to the computation of null count. Detail can see: https://github.com/apache/arrow-rs/pull/6155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org