alamb commented on pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#issuecomment-898877303
That is an interesting article. Looking at the summary: > The common implementation of the function using hashing techniques suffers lower throughput rate due to the collision of the insert keys in the hashing techniques..... I actually found it very hard to test the group by collision handling correctness because the hashing technique in `create_hashes` was so good I could not find any example data that hased to the same value in a reasonable amount of time -- LOL However, the technique to search several slots at once might indeed be relevant <https://www.researchgate.net/figure/SIMD-accelerated-cuckoo-hashing-extended-from-Ross-et-al-14_fig1_326669722> On Sat, Aug 14, 2021 at 4:58 AM Jorge Leitao ***@***.***> wrote: > Potentially relevant: > https://www.researchgate.net/publication/326669722_SIMD_Vectorized_Hashing_for_Grouped_Aggregation_22nd_European_Conference_ADBIS_2018_Budapest_Hungary_September_2-5_2018_Proceedings > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/arrow-datafusion/pull/808#issuecomment-898873524>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AADXZMNMY67U56FRXCORIV3T4Y45BANCNFSM5BLDRZWA> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> > . > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
