[
https://issues.apache.org/jira/browse/ARROW-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362971#comment-17362971
]
Ian Cook commented on ARROW-12709:
----------------------------------
[~apitrou] the intent for this ticket is a truly variadic kernel, equivalent to
the SQL string functions {{concat}} and {{concat_ws}} which join the strings
across an arbitrary number of string columns. This is a common usage pattern in
data warehouse-type queries.
For joining the strings in an array of list of strings, we have {{binary_join}}
(ARROW-10959). That plus an {{adjoin_as_list}} function (ARROW-12739) gets us
to the same result as this when the string-like columns have the same type, but
my understanding is that the speed and memory usage of calling this variadic
kernel will be much superior to calling {{adjoin_as_list}} then {{binary_join}}.
> [C++] Add variadic string join kernel
> -------------------------------------
>
> Key: ARROW-12709
> URL: https://issues.apache.org/jira/browse/ARROW-12709
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Ian Cook
> Assignee: David Li
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Similar to SQL's {{concat}} and {{concat_ws}}. Should take 0, 1, 2, ...
> string arrays and an optional separator (default empty string) and
> concatenate them together, returning a string array.
> For example, in the case of 2 input arrays and with the separator {{"-"}},
> this would take inputs:
> {code}
> Array<string> Array<string>
> [ [
> "foo", "bar",
> "push" "pop"
> ] ]
> {code}
> and return output:
> {code}
> Array<string>
> [
> "foo-bar",
> "push-pop"
> ]
> {code}
> Should also accept scalar strings and recycle their values.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)