alamb opened a new pull request #9233: URL: https://github.com/apache/arrow/pull/9233
I am throwing up a draft PR just to give people a heads up that I am working on this feature This PR adds support for GROUP BY with for columns of Dictionary type. This definitely will conflict with Dandan's implementation of vectorized hashes / group by in https://github.com/apache/arrow/pull/9213 and https://github.com/apache/arrow/pull/9116. I plan to rework this PR once his are merged. The code basically just follows the pattern (aka is mostly copy/paste) from the take kernel: https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/cast.rs#L294 I chose the "correct first, optimzie later" approach here -- there are many ways to make this code faster, especially when grouping on string types, It feels like a lot of copy/paste and I don't feel great about the coverage of all the types. I am contemplating some more interesting / full coverage as well as if there is some way to reuse the recurring pattern of switch and dispatch for a dictionary types. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
