[ 
https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104666#comment-17104666
 ] 

Maarten Breddels commented on ARROW-555:
----------------------------------------

Something to consider (or should I move this discussion to the list?), is the 
support of ASCII vs utf8. I noticed the Gandiva code assumed ASCII (at least 
not utf8), while Arrow assumes strings are utf8 only. Having written the vaex 
string code, I'm pretty sure ASCII will be much faster though (you know the 
byte length of a string in advance). Is there interest in supporting more than 
utf8, ASCII for instance, or utf16/32? Or should it be utf8 only?

> [C++] String algorithm library for StringArray/BinaryArray
> ----------------------------------------------------------
>
>                 Key: ARROW-555
>                 URL: https://issues.apache.org/jira/browse/ARROW-555
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>              Labels: Analytics
>
> This is a parent JIRA for starting a module for processing strings in-memory 
> arranged in Arrow format. This will include using the re2 C++ regular 
> expression library and other standard string manipulations (such as those 
> found on Python's string objects)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to