[
https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104666#comment-17104666
]
Maarten Breddels commented on ARROW-555:
----------------------------------------
Something to consider (or should I move this discussion to the list?), is the
support of ASCII vs utf8. I noticed the Gandiva code assumed ASCII (at least
not utf8), while Arrow assumes strings are utf8 only. Having written the vaex
string code, I'm pretty sure ASCII will be much faster though (you know the
byte length of a string in advance). Is there interest in supporting more than
utf8, ASCII for instance, or utf16/32? Or should it be utf8 only?
> [C++] String algorithm library for StringArray/BinaryArray
> ----------------------------------------------------------
>
> Key: ARROW-555
> URL: https://issues.apache.org/jira/browse/ARROW-555
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Labels: Analytics
>
> This is a parent JIRA for starting a module for processing strings in-memory
> arranged in Arrow format. This will include using the re2 C++ regular
> expression library and other standard string manipulations (such as those
> found on Python's string objects)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)