niyue opened a new issue, #39052: URL: https://github.com/apache/arrow/issues/39052
### Describe the enhancement requested # Description Currently, Gandiva has some internal stub functions, which are registered via two steps: 1) function metadata are registered in multiple internal registry classes, such as: 1.1) `GetStringFunctionRegistry` in `function_registry_string.cc` 1.2) `GetMathOpsFunctionRegistry` in `function_registry_math_ops.cc` 1.3) etc 2) The stub functions' implementation are mapped to LLVM engine in: 2.1) `ExportedStubFunctions::AddMappings` 2.2) `ExportedHashFunctions::AddMappings` 2.3) `ExportedStringFunctions::AddMappings` There are some issues with this organizing approach: * When adding/removing a stub function, developers need to look for and change two places, which is not convenient. For example, when adding a new string function, both `GetStringFunctionRegistry` in `function_registry_string.cc` and `ExportedStringFunctions::AddMappings` in `gdv_string_function_stubs.cc` need to be modified * The LLVM type information provided in the `AddMappings` API is similar as the function signature metadata provided in `GetXXXFunctionRegistry` API, which cost more time and effort for developers to maintain. # Proposal In PR https://github.com/apache/arrow/pull/38632, we added the capability to programmatically map function signature `NativeFunction` into LLVM-typed args. So the LLVM args for each function in `AddMappings` could be mapped directly from its `NativeFunction`. This proposal plans to use `FunctionRegistry`'s `Register` C function API to internally register the existing stub functions, and this will leverage the above mapping capability, and for stub functions, we could combine the metadata registration and implementation mapping into one step, so that: * stub function metadata and implementation are associated and registered in one place, and developers don't have to look for two places for maintainance * when adding/updating a stub function's signature, there is no need for developers to manually map arrow data type signature into LLVM-typed args, which makes it easier to maintain and it is less error prone. And this will simplify the code a lot as well, it is expected to reduce 1500+ lines of code via this change. ### Component(s) C++ - Gandiva -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
