Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

via GitHub Tue, 06 May 2025 03:11:52 -0700


Rachelint commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r2075154369



##########
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs:
##########
@@ -93,20 +94,27 @@ where
         opt_filter: Option<&BooleanArray>,
         total_num_groups: usize,
     ) -> Result<()> {

Review Comment:
   > yes, i mean block size. I would expect something like batchsize (4-8k), or 
maybe even bigger to have lower overhead? Did you run some experiments?
   
   Yes, I try it.
   
   Now I set `block_size = batch_size`. 
   
   I try the smaller `batch_size` like 1024, and `this pr` show improvement 
compared to `main`.
   
   It is due to `this pr` can also eliminating the call of `Array::slice`, 
which is non-trivial due to the computation of null count. Detail can see: 
https://github.com/apache/arrow-rs/pull/6155



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

Reply via email to