Dandandan opened a new issue #26: URL: https://github.com/apache/arrow-datafusion/issues/26
Updating the hash aggregate implementation to use vectorized hashing should give a decent speed up to queries that are dependant on fast hash aggregate implementations. Currently keys are generated of type `Vec<u8>` and are hashed row-by-row which causes * more memory usage * slow re-hashing of the backing hashmap * type un-aware hashing for simple primitive values The implementation should also solve hash collisions, so the original should be able to be compared with the values. There is some WIP code here which can be used as a starting point / to continue from. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
