[GitHub] [arrow-datafusion] alamb opened a new issue #846: Improve grouping performance by special casing small / fixed size keys

GitBox Mon, 09 Aug 2021 10:34:46 -0700


alamb opened a new issue #846:
URL: https://github.com/apache/arrow-datafusion/issues/846



   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The improved grouping algorithm on #790 improves grouping performance in 
general for DataFusion and is also general in that it works for all types of 
keys. 
   
   However, @sundy-li  noted on 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895180385 
that additional performance is likely possible by special casing "small" and 
fixed sized keys. 
   
   
   
   **Describe the solution you'd like**
   From @sundy-li ' comment:
   
   Introduce the variant hash methods would help in this case.  
   E.G:
   
   Query which group by 3 columns, which are [u8, u8, u16],  a fixed hash key 
U32 will be enough.
   
   1. We can allocate one large fixed memory than multiple vec<u8> allocate.
   2. The fixed key saves the hash map memory size.
   
   Refer:
   
https://github.com/datafuselabs/datafuse/blob/master/common/datablocks/src/kernels/data_block_group_by.rs#L17-L36
 
   
   
https://github.com/datafuselabs/datafuse/blob/master/common/datablocks/src/kernels/data_block_group_by_hash.rs#L264-L274
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue #846: Improve grouping performance by special casing small / fixed size keys

Reply via email to