[ 
https://issues.apache.org/jira/browse/ARROW-12710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379970#comment-17379970
 ] 

Ben Kietzman commented on ARROW-12710:
--------------------------------------

Our current scalar aggregation has deterministic ordering, but that will fall 
down rapidly when those functions are used in an ExecPlan (where no ordering is 
guaranteed).

If we implement this as a ScalarAggregateKernel, the KernelState of the string 
concat agg kernel will need to include ordering criteria so that 
{{merge(move(state1), &state0)}} can be guaranteed equivalent to 
{{merge(move(state0), &state1)}}. Furthermore, {{merge}} cannot actually 
concatenate anything because if we happened to first {{merge(move(state0), 
&state3)}} we'd have no way to insert {{state1, state2}} in the middle later. 
Actual concatenation would have to wait for {{finalize}}.

Those ordering criteria could be synthesized from (for example) fragment/batch 
indices in a dataset scan, but the presence of O(N) state in a scalar agg 
kernel's State is suspect to me and I'm not sure it's a great match for 
ScalarAggregateKernel.

> [C++] String concatenate aggregate kernel
> -----------------------------------------
>
>                 Key: ARROW-12710
>                 URL: https://issues.apache.org/jira/browse/ARROW-12710
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Ian Cook
>            Priority: Major
>
> Like MySQL/Impala {{group_concat}} and PostgreSQL {{string_agg}}. Takes a 
> string array and a separator (possibly optional?) and returns one scalar 
> string (one per group in the case of group aggregation) representing all the 
> string values in the array concatenated together, with the separator added 
> between each pair of concatenated values.
> For example, in the case of no grouping and using separator {{"-"}}, this 
> would take input:
> {code}
> Array<string>
> [ 
>   "foo",
>   "bar",
>   "baz"
> ]
> {code}
> and return the following string scalar as output:
> {code}
> "foo-bar-baz"
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to