[
https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104443#comment-17104443
]
Wes McKinney commented on ARROW-555:
------------------------------------
Update: I'm in the middle of an overhaul of the API for implementing new Array
functions / kernels, with the goal of making it much easier to add new
functions (e.g. generating a string function given an inlineable implementation
of computing a single value). Once that's done (since I'm working on it right
now, it will be this month) I will probably ask someone from my team to make an
initial cut at a precompiled string function set based on the functions that
are already in Gandiva / LLVM codegen and add new functions (from e.g. Impala
or other SQL engines) that are not yet present. The work need not be monolithic
so as soon as the framework is in place it should be straightforward to add new
functions and test them. Additionally, adding Python bindings for the new
functions should also be easy (all you will need is the name of the function
you're calling, so some of the Cython binding boilerplate that exists now
should also go away).
> [C++] String algorithm library for StringArray/BinaryArray
> ----------------------------------------------------------
>
> Key: ARROW-555
> URL: https://issues.apache.org/jira/browse/ARROW-555
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Labels: Analytics
>
> This is a parent JIRA for starting a module for processing strings in-memory
> arranged in Arrow format. This will include using the re2 C++ regular
> expression library and other standard string manipulations (such as those
> found on Python's string objects)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)