Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

via GitHub Thu, 12 Oct 2023 19:48:22 -0700


niyue commented on code in PR #38116:
URL: https://github.com/apache/arrow/pull/38116#discussion_r1357691913



##########
cpp/src/gandiva/engine.cc:
##########
@@ -152,6 +153,7 @@ Status Engine::LoadFunctionIRs() {
   if (!functions_loaded_) {
     ARROW_RETURN_NOT_OK(LoadPreCompiledIR());
     ARROW_RETURN_NOT_OK(DecimalIR::AddFunctions(this));
+    ARROW_RETURN_NOT_OK(LoadExternalPreCompiledIR());

Review Comment:
   # Issue 1
   > I like the std::vector<NativeFunction> approach. I think that managing 
both of signatures and bitcodes by FunctionRegistry is better than introducing 
IR store for bitcodes. The signature and bitcode mapping will be useful for 
loading bitcode to gandiva::Engine selectively for performance
   
   I agree. It seems simpler for users to call just one API because I don't see 
any use scenario that users would only call one of the two APIs 
(`FunctionRegistry::Add`, `LLVMExternalIRStore::Add`), additionally, we could 
take advantage of this signature <==> bitcode mapping later to optimize 
performance during LLVM module construction. @js8544 do you have any opinion on 
this idea? Thanks.
   
   # Issue 2
   > Could you explain "some shared library mechanism" a bit more?
   > I imagine that gcc -shared -o multiply_by_two.so multiply_by_two.cc and 
handle = dlopen("multiply_by_two.so"); dlsym(handle, "multiply_by_two_int32")) 
to gandiva::Engine
   
   Sorry I don't make it clear enough. This probably worths another PR for it 
and I can add some background here. This PR allows users to call pre-compiled 
functions from LLVM IR/bitcode. Gandiva users need to generate LLVM IR for 
their functions for integration. This approach will work for small and simple 
functions, typically for functions without using any dependent library. 
However, if a function would like to do some more complex task, typically 
function authors would like to rely on third party libraries. For example, if I 
would like to implement a function calling a service via HTTP with JSON 
request/response (e.g. twitter API/OpenAI API), this function will have to 
depend on an HTTP library and a JSON parsing library (or even some async IO 
library). If all these dependent libraries are compiled and generated into LLVM 
IR, the IR/bitcode will be non trivial, and this will make LLVM module 
construction pretty slow. 
   
   So I think it may help to compile this kind of complex function into a 
shared library (instead of LLVM IR), and ask the LLVM engine to load the shared 
library during runtime and lookup function directly in the shared library. I 
briefly look into how this could be done, but more investigation is needed 
later:
   1) LLVM has an API called `DynamicLibrary::LoadLibraryPermanently` which can 
be used for loading a shared library during runtime. 
   1.1) this is the `DynamicLibrary` API doc in LLVM, 
https://llvm.org/doxygen/classllvm_1_1sys_1_1DynamicLibrary.html
   1.2) `Engine::InitOnce` function in `engine.cc:115` currently calls this 
`LoadLibraryPermanently` API but it loads nothing currently.
   2) As you said, we can use `dlopen` and `dlsym` to get function pointer in 
the shared library
   3) `Engine::AddGlobalMappingForFunc` is used for programmatically setting up 
a mapping from function name to function pointer, currently, there are some 
functions like `gdv_fn_context_arena_malloc` is mapped via this API
   4) `void* Engine::CompiledFunction(llvm::Function* irFunction)` is used for 
looking up the function in the module
   I think it is possible to piece all these together to make them work and 
address the above complex function performance issue.
   
   And back to the original topic, I think about it again, and if we really try 
to pursuit such goal, maybe the API for function registration could be 
something like `FunctionRegistry::Add(std::vector<NativeFunction> funcs, const 
std::string path_to_some_shared_library)` if we adopt the approach mentioned in 
#1 section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

Reply via email to