Maarten Breddels created ARROW-9991:
---------------------------------------

             Summary: [C++] split kernsl for strings/binary
                 Key: ARROW-9991
                 URL: https://issues.apache.org/jira/browse/ARROW-9991
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Maarten Breddels
            Assignee: Maarten Breddels


Similar to Python str.split and bytes.split, we'd like to have a way to convert 
str into list[str] (and similarly for bytes).

When the separator is given, the algorithms for both types are the same. 
Python, however, overloads strip. When given no separator, the algorithm will 
split considering all whitespace (unicode for str, ascii for bytes) as 
separator.

I'd rather see not too much overloaded kernels, e.g.
 # 
binary_split (takes string/binary separator, and maxsplit arg, no special utf8 
version needed)


 
utf8_split_whitespace (similar to Python's version given no separator)
asi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to