alamb opened a new pull request #9233:
URL: https://github.com/apache/arrow/pull/9233


   I am throwing up a draft PR just to give people a heads up that I am working 
on this feature
   
   This PR adds support for GROUP BY with for columns of Dictionary type. 
   
   This definitely will conflict with Dandan's implementation of vectorized 
hashes / group by in https://github.com/apache/arrow/pull/9213 and 
https://github.com/apache/arrow/pull/9116. I plan to rework this PR once his 
are merged.
   
   The code basically just follows the pattern (aka is mostly copy/paste) from 
the take kernel: 
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/cast.rs#L294
   
   I chose the "correct first, optimzie later" approach here -- there are many 
ways to make this code faster, especially when grouping on string types,
   
   It feels like a lot of copy/paste and I don't feel great about the coverage 
of all the types. I am contemplating some more interesting / full coverage as 
well as  if there is some way to reuse the recurring pattern of switch and 
dispatch for a dictionary types.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to