[GitHub] [arrow-datafusion] Dandandan commented on pull request #6800: RFC: Demonstrate new `GroupHashAggregate` stream approach (runs more than 2x faster!)

via GitHub Wed, 05 Jul 2023 11:05:56 -0700


Dandandan commented on PR #6800:
URL: 
https://github.com/apache/arrow-datafusion/pull/6800#issuecomment-1622236683


   My 
   
   > > MIN/MAX are going to be a bit trickier than the other ones as they also 
support non-primitive types (e.g. strings). I will first implement the 
primitive version of those if that sounds fair @alamb .
   > 
   > Sounds great! Figuring out how to extend the model for strings will be a 
good exercise I think
   
   The easiest approach would be storing elements in`Vec<String>` (as it may 
need to grow) or similar. We can mutate the original strings (instead of 
creating new ones for replacements) to keep the allocations a bit lower.
   
   An alternative approach would be to keep a number of buffers which can hold 
variable-sized data until a certain maximum size (in buckets of say 10, 20, 30 
bytes) and a list of "free" items that have been moved. I think this might be a 
fast approach for small strings, but also introduce some complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #6800: RFC: Demonstrate new `GroupHashAggregate` stream approach (runs more than 2x faster!)

Reply via email to