niyue commented on code in PR #38763: URL: https://github.com/apache/arrow/pull/38763#discussion_r1409237889
########## docs/source/cpp/gandiva_external_func.rst: ########## @@ -0,0 +1,251 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +============================================ +Gandiva External Functions Development Guide +============================================ + +1. Introduction +=============== + +Gandiva, as an analytical expression compiler framework, extends its functionality through external functions. This guide is focused on helping developers understand, create, and integrate external functions into Gandiva. External functions are user-defined, third-party functions that can be used in Gandiva expressions. + +2. Overview of External Function Types in Gandiva +================================================= + +Gandiva supports two primary types of external functions: + +* C Functions: Functions conforming to the C calling convention. Developers can implement functions in various languages (like C++, Rust, C, or Zig) and expose them as C functions for Gandiva. + +* IR Functions: Functions implemented in LLVM's Intermediate Representation (IR). These can be written in multiple languages and then compiled into LLVM IR to be registered in Gandiva. + +2.1 Choosing the Right Type of External Function for Your Needs +--------------------------------------------------------------- + +When integrating external functions into Gandiva, it's crucial to select the type that best fits your specific requirements. Here are the key distinctions between C Functions and IR Functions to guide your decision: + +* C Functions + * **Language Flexibility:** C functions offer the flexibility to implement your logic in a preferred programming language and subsequently expose them as C functions. + * **Broad Applicability:** They are generally a go-to choice for a wide range of use cases due to their compatibility and ease of integration. + +* IR Functions + * **IR Compilation Requirement:** For IR functions, the entire implementation, including any third-party libraries used, must be compiled into LLVM IR. This might affect performance, especially if the dependent libraries are complex. + * **Limitations in Capabilities:** Certain advanced features, such as using thread-local variables, are not supported in IR functions. This is due to the limitations of the current JIT (Just-In-Time) engine utilized internally by Gandiva. + * **Recommended Use Cases:** IR functions are best suited for simpler tasks that don't demand intricate logic or reliance on complex third-party libraries. They are also a good fit if your project already incorporates the LLVM toolchain. + +3. External function registration +================================= + +To make a function available to Gandiva, you need to register it as an external function, providing both a function's metadata and its implementation to Gandiva. + +3.1 Using the NativeFunction Class +---------------------------------- + +To register a function in Gandiva, use the ``gandiva::NativeFunction`` class. This class captures both the signature and metadata of the external function. + +Constructor Details for ``gandiva::NativeFunction``: + +.. code-block:: cpp + + NativeFunction(const std::string& base_name, const std::vector<std::string>& aliases, + const DataTypeVector& param_types, const DataTypePtr& ret_type, + the ResultNullableType& result_nullable_type, std::string pc_name, + int32_t flags = 0); + +The ``NativeFunction`` class is used to define the metadata for an external function. Here is a breakdown of its constructor parameters: + +* ``base_name``: The name of the function as it will be used in expressions. +* ``aliases``: A list of alternative names for the function. +* ``param_types``: A vector of ``arrow::DataType`` objects representing the types of the parameters that the function accepts. +* ``ret_type``: A ``std::shared_ptr<arrow::DataType>`` representing the return type of the function. +* ``result_nullable_type``: This parameter indicates whether the result can be null, based on the nullability of the input arguments. It can take one of the following values: + * ``ResultNullableType::kResultNullIfNull``: result validity is an intersection of the validity of the children. + * ``ResultNullableType::kResultNullNever``: result is always valid. + * ``ResultNullableType::kResultNullInternal``: result validity depends on some internal logic. +* ``pc_name``: The name of the corresponding precompiled function. + * Typically, this name follows the convention ``{base_name}`` + ``_{param1_type}`` + ``{param2_type}`` + ... + ``{paramN_type}``. For example, if the base name is ``add`` and the function takes two ``int32`` parameters and returns an ``int32``, the precompiled function name would be ``add_int32_int32``, but this convention is not mandatory as long as you can guarantee its uniqueness. +* ``flags``: Optional flags for additional function attributes (default is 0). Please check out ``NativeFunction::kNeedsContext``, ``NativeFunction::kNeedsFunctionHolder``, and ``NativeFunction::kCanReturnErrors`` for more details. + +3.2 External C functions +------------------------ + +External C functions can be authored in different languages and exposed as C functions. Compatibility with Gandiva's type system is crucial. + +3.2.1 C Function Signature +************************** + +3.2.1.1 Signature Mapping +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following table lists the mapping between Gandiva external function signature types and the C function signature types: Review Comment: @js8544 I added `time64`, `binary`, `internal_month`, and `interval_day_time`, but I am not sure how `decimal` should be represented in C functions so it is not added yet. Is there any sample stub functions that has decimal parameter I can learn from? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
