Dandandan commented on issue #839: URL: https://github.com/apache/arrow-datafusion/issues/839#issuecomment-895017194
> It's done to the entire record batches. Considering the batches are 1024 rows with 3 columns, if the key number is 256, then there will be 256 small batches after the take, each batch is about 4 rows. `Simd` will make no sense in this situation. > > > Yes, but not in the code you linked. There the `take` is done once for the arrays in `aggr_input_values`, regardless of how many distinct keys we have in the batch. The take only rearranges the data to be in the same order as the keys but does it for only once for each input array for the aggregate expression values (not even the entire batch). In other places, yes, operations like `Array::slice` are done for each key that occurred which starts to become inefficient when having a lot of small groups in the batch and also show up in profiles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
