[ 
https://issues.apache.org/jira/browse/ARROW-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804371#comment-16804371
 ] 

Wes McKinney commented on ARROW-5002:
-------------------------------------

I think what we are discussing here is called "hash aggregation" in the 
database literature and is a type of execution node. This is related to 
ARROW-3978, where we need to be able to compute group ordinal indexes via 
hashing for multiple keys

I'm not sure about the "ConsumeWithGroups" API. What do other analytic database 
engines (Impala, Clickhouse, etc.) do?

> [C++] Implement GroupBy
> -----------------------
>
>                 Key: ARROW-5002
>                 URL: https://issues.apache.org/jira/browse/ARROW-5002
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Philipp Moritz
>            Priority: Major
>
> Dear all,
> I wonder what the best way forward is for implementing GroupBy kernels. 
> Initially this was part of
> https://issues.apache.org/jira/browse/ARROW-4124
> but is not contained in the current implementation as far as I can tell.
> It seems that the part of group by that just returns indices could be 
> conveniently implemented with the HashKernel. That seems useful in any case. 
> Is that indeed the best way forward/should this be done?
> GroupBy + Aggregate could then either be implemented with that + the Take 
> kernel + aggregation involving more memory copies than necessary though or as 
> part of the aggregate kernel. Probably the latter is preferred, any thoughts 
> on that?
> Am I missing any other JIRAs related to this?
> Best, Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to