> The definition of an external function registry can certainly belong in Gandiva, but how it's populated should be left to third-party projects
Are you proposing a more general approach, like incorporating the following APIs into Gandiva? (Please note that the function names/signatures are tentative and just meant for illustrative purposes.) 1) AddExternalFunctionRegistry(ExternalFunctionRegistry function_registry) 2) AddFunctionBitcodeLoader(FunctionBitcodeLoader bitcode_loader) Where `ExternalFunctionRegistry` can return a list of function definitions and `FunctionBitcodeLoader` can return a list of bitcode buffers, so that the specific metadata/bitcode data population logic can be moved out of Gandiva? Thanks. Regards, Yue On Tue, Sep 26, 2023 at 12:25 AM Antoine Pitrou <anto...@python.org> wrote: > > Hi Yue, > > Le 25/09/2023 à 18:15, Yue Ni a écrit : > > > >> a CMake entrypoint (for example a function) making it easy for > > third-party projects to compile their own functions > > I can come up with a minimum CMake template so that users can compile C++ > > based functions, and I think if the integration happens at the LLVM IR > > level, it is possible to author the functions beyond C++ languages, such > as > > Rust/Zig as long as the compiler can generate LLVM IR (there are other > > issues that need to be addressed from the Rust experiment I made, but > that > > can be another proposal/PR). If we make that work, CMake is probably not > so > > important either since other languages can use their own build tools such > > as Cargo/zig build, and we just need some documentation to describe how > it > > should be interfaced typically. > > As long as there's a well-known and supported way to generate the code > for external functions, then it's fine to me. > > (also the required signature for these functions should be documented > somewhere) > > >> The rest of the proposal (a specific JSON file format, a bunch of > functions > > to iterate directory entries in a specific layout) is IMHO off-topic for > > Gandiva, and each third-party project can implement their own idioms for > > the discovery of external functions > > > > Could you give some more guidance on how this should work without an > > external function registry containing metadata? As far as I know, for > each > > pre-compiled function used in an expression, Gandiva needs to lookup its > > signature from the function registry, which currently is a C++ class that > > is hard coded to contain 6 categories of built-in functions > > (arithmetic/datetime/hash/mathops/string/datetime arithmetic). If a third > > party function cannot be found in the registry, it cannot be used in the > > expression. If we don't load the pre-compiled function metadata from > > external files, how do we avoid Gandiva rejecting the expression when a > > third party function cannot be found in the function registry? Thanks. > > What I'm saying is that code to load function metadata from JSON and > walk directories of .bc files does not belong in Gandiva. The definition > of an external function registry can certainly belong in Gandiva, but > how it's populated should be left to third-party projects (which then > don't have to use JSON or a given directory layout). > > Regards > > Antoine. >