Maarten Breddels created ARROW-9991:
---------------------------------------
Summary: [C++] split kernsl for strings/binary
Key: ARROW-9991
URL: https://issues.apache.org/jira/browse/ARROW-9991
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Maarten Breddels
Assignee: Maarten Breddels
Similar to Python str.split and bytes.split, we'd like to have a way to convert
str into list[str] (and similarly for bytes).
When the separator is given, the algorithms for both types are the same.
Python, however, overloads strip. When given no separator, the algorithm will
split considering all whitespace (unicode for str, ascii for bytes) as
separator.
I'd rather see not too much overloaded kernels, e.g.
#
binary_split (takes string/binary separator, and maxsplit arg, no special utf8
version needed)
utf8_split_whitespace (similar to Python's version given no separator)
asi
--
This message was sent by Atlassian Jira
(v8.3.4#803005)