[ 
https://issues.apache.org/jira/browse/ARROW-11112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332337#comment-17332337
 ] 

Andrew Lamb commented on ARROW-11112:
-------------------------------------

Migrated to github: https://github.com/apache/arrow-datafusion/issues/142

> [Rust][DataFusion] Implement vectorized hashing
> -----------------------------------------------
>
>                 Key: ARROW-11112
>                 URL: https://issues.apache.org/jira/browse/ARROW-11112
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Daniël Heres
>            Priority: Major
>
> Currently, the approach of the join and hash aggregates is to create a key 
> individually from the row values. However, this is far from ideal, as it 
> doesn't utilize the cache vectorized nature of Arrow, but instead copies data 
> into a vec, traverses multiple arrays in the inner loop, etc.
> This blog post has a summary of an approach to do this in a vectorized way.
> [https://www.cockroachlabs.com/blog/vectorized-hash-joiner/]
>  
> TBD:
> We should decide/find out whether it still makes sense to use rust `HashMap` 
> (with () as key?) or whether to create an own? Benefit of using hashmap is 
> that there is an API, can resize automatically, and uses SIMD, and also 
> exposes some lower level bits we can use here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to