Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-892983559
> @Dandandan if you have a moment, I would like to know if you have any concerns with the "change `create_hashes`" function item above, before I spend significant time on it I will try to have a better look at this later. The first feeling I have is that the example/proposal is: * More row-based than the `create_hashes` as it is today. The important part of a vectorized hashing is that the inner loop should be on on the same array with the same type, and not have to move memory locations and move to different parts of the code for hashing each row. * Creating/keeping the more complex `GroupKey` created per row, making creation of the keys (allocation per key / not cache friendly) and making re-hashing of the key more expensive (no simple or even identity function as hash) * Harder to be further vectorized. My belief is that using the Rust HashMap is not really the end state of the hash join and hash aggregate, but an easier way to implement it. It might be still an improvement over the current state (for hash aggregate), it looks like it simplifies some parts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
