[GitHub] [arrow-datafusion] Dandandan opened a new issue #26: Vectorized hashing for hash aggregation code

GitBox Wed, 21 Apr 2021 10:59:30 -0700


Dandandan opened a new issue #26:
URL: https://github.com/apache/arrow-datafusion/issues/26



   Updating the hash aggregate implementation to use vectorized hashing should 
give a decent speed up to queries that are dependant on fast hash aggregate 
implementations.
   
   Currently keys are generated of type `Vec<u8>` and are hashed row-by-row 
which causes
   
   * more memory usage
   * slow re-hashing of the backing hashmap
   * type un-aware hashing for simple primitive values
   
   The implementation should also solve hash collisions, so the original should 
be able to be compared with the values.
   
   There is some WIP code here which can be used as a starting point / to 
continue from.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan opened a new issue #26: Vectorized hashing for hash aggregation code

Reply via email to