sundy-li edited a comment on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895180385
> Another cause coulde be that hashing and comparing one Vec<u8> might be faster than hashing two single strings and combining them afterwards (however I would have expect the extra copying / rehashing to be worse than the single cost of hashing itself) Introduce the variant hash methods would help in this case. E.G: Query which group by 3 columns, which are [u8, u8, u16], a fixed hash key U32 will be enough. 1. We can allocate one large fixed memory than multiple vec<u8> allocate. 2. The fixed memory saves the hash map memory size. Refer: https://github.com/datafuselabs/datafuse/blob/master/common/datablocks/src/kernels/data_block_group_by.rs#L17-L36 https://github.com/datafuselabs/datafuse/blob/master/common/datablocks/src/kernels/data_block_group_by_hash.rs#L264-L274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org