[ 
https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051302#comment-17051302
 ] 

Wes McKinney commented on ARROW-555:
------------------------------------

We've been having some discussions about this topic in other places, e.g. 
ARROW-7083. One idea that has been proposed is to generate single-function 
kernels at compile time based on the LLVM IR that Gandiva spits out. So the 
process would work like this:

* Generate a library of LLVM IR for all supported Gandiva kernels, with an 
exported manifest so that you can dynamically determine what kernels are 
available and what are their input and output signatures
* Compile that LLVM IR into a C shared library
* Implement a generic "invoker" that takes a C function kernel (the result of 
compiling the LLVM IR produced by Gandiva) and evaluates it (with memory 
allocation, etc. as needed)

Then the LLVM runtime would not be required to use the output of this process.

This would require some investment of time (perhaps not that much) to set up 
the machinery to enable this, but it would seem to greatly simplify the process 
of implementing new kernels, especially simple elementwise functions (for 
numbers, strings, etc.)

We've been dancing around this idea for several months now so I would be 
interested to see if someone would be interested to explore this before 
tunneling too far in different directions. 

cc [~emkornfield] [~apitrou] [~fsaintjacques] [~jnadeau] [~ravindra] for any 
comments / thoughts if what I've written above jives with prior discussions

> [C++] String algorithm library for StringArray/BinaryArray
> ----------------------------------------------------------
>
>                 Key: ARROW-555
>                 URL: https://issues.apache.org/jira/browse/ARROW-555
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>              Labels: Analytics
>
> This is a parent JIRA for starting a module for processing strings in-memory 
> arranged in Arrow format. This will include using the re2 C++ regular 
> expression library and other standard string manipulations (such as those 
> found on Python's string objects)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to