[GitHub] [arrow] westonpace commented on issue #11799: arrow::compute `hash_*` functions matching input types

GitBox Wed, 01 Dec 2021 18:04:19 -0800


westonpace commented on issue #11799:
URL: https://github.com/apache/arrow/issues/11799#issuecomment-984224525



   The `hash_*` functions all take, as the last argument, a uint32 array of 
group ids which explains the error.  However, even if you were to correct this 
you would get the error: `Direct execution of HASH_AGGREGATE functions`.
   
   At the time (6.0.1) this was prevented.  I think because we didn't want to 
confuse people that expected something more like a "group by" operation that 
both computes the group ids and performs the aggregate.  The calculation of 
group ids is not currently exposed because it is stateful and we haven't 
exposed any stateful kernels.
   
   So, at the moment, I think you may be out of luck with 6.0.1.  I believe the 
only way to utilize these would be to use the query plan directly and that 
hasn't been documented yet (other than dplyr).  Work is being done to document 
the query plans in C++ and to expose the query plans in python via ibis and 
there has been some discussion on exposing the hash kernels and grouping 
kernels directly as it could be useful and simple for "dataset-in-memory" 
operations.  And, of course, there is the approach that Alenka shared.  So I 
think you will have several options once 7.0.0 releases.
   
   If you're interested in an undocumented C++ approach I could share a snippet 
on how you could use a query plan to accomplish what you want.  Although I 
would have to know a bit more what you were trying to accomplish.  Were you 
intending to group by the string column and return the sums of the int64 column?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #11799: arrow::compute `hash_*` functions matching input types

Reply via email to