Dandandan opened a new issue, #19938: URL: https://github.com/apache/datafusion/issues/19938
### Is your feature request related to a problem or challenge? Currently, grouped aggregates follow this path (simplified) * create hashes for columns * group by hash using a hash table / check equality The approach is well optimized, but we can avoid a lot of work if we don't have to hash and use a hashtable. ### Describe the solution you'd like When the column statistics includinf the range (min/max) s known for a group by column, and the range is not too large, we can store the groups in a `Vec` where each element at `i` represents the group `min + i`, using direct indexing. This could save a lot of overhead. This is very similar to whats implemented in https://github.com/apache/datafusion/pull/19411 for joins. ### Describe alternatives you've considered We could also consider computing the statistics on the fly and switch dynamically to a hash table vs hash map (i.e. copy all entries to a hash table once the range exceeds the maximum). ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
