niyue commented on code in PR #38763:
URL: https://github.com/apache/arrow/pull/38763#discussion_r1409237889


##########
docs/source/cpp/gandiva_external_func.rst:
##########
@@ -0,0 +1,251 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+.. http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+============================================
+Gandiva External Functions Development Guide
+============================================
+
+1. Introduction
+===============
+
+Gandiva, as an analytical expression compiler framework, extends its 
functionality through external functions. This guide is focused on helping 
developers understand, create, and integrate external functions into Gandiva. 
External functions are user-defined, third-party functions that can be used in 
Gandiva expressions.
+
+2. Overview of External Function Types in Gandiva
+=================================================
+
+Gandiva supports two primary types of external functions:
+
+* C Functions: Functions conforming to the C calling convention. Developers 
can implement functions in various languages (like C++, Rust, C, or Zig) and 
expose them as C functions for Gandiva.
+
+* IR Functions: Functions implemented in LLVM's Intermediate Representation 
(IR). These can be written in multiple languages and then compiled into LLVM IR 
to be registered in Gandiva.
+
+2.1 Choosing the Right Type of External Function for Your Needs
+---------------------------------------------------------------
+
+When integrating external functions into Gandiva, it's crucial to select the 
type that best fits your specific requirements. Here are the key distinctions 
between C Functions and IR Functions to guide your decision:
+
+* C Functions
+    * **Language Flexibility:** C functions offer the flexibility to implement 
your logic in a preferred programming language and subsequently expose them as 
C functions.
+    * **Broad Applicability:** They are generally a go-to choice for a wide 
range of use cases due to their compatibility and ease of integration.
+
+* IR Functions
+    * **IR Compilation Requirement:** For IR functions, the entire 
implementation, including any third-party libraries used, must be compiled into 
LLVM IR. This might affect performance, especially if the dependent libraries 
are complex.
+    * **Limitations in Capabilities:** Certain advanced features, such as 
using thread-local variables, are not supported in IR functions. This is due to 
the limitations of the current JIT (Just-In-Time) engine utilized internally by 
Gandiva.
+    * **Recommended Use Cases:** IR functions are best suited for simpler 
tasks that don't demand intricate logic or reliance on complex third-party 
libraries. They are also a good fit if your project already incorporates the 
LLVM toolchain.
+
+3. External function registration
+=================================
+
+To make a function available to Gandiva, you need to register it as an 
external function, providing both a function's metadata and its implementation 
to Gandiva.
+
+3.1 Using the NativeFunction Class
+----------------------------------
+
+To register a function in Gandiva, use the ``gandiva::NativeFunction`` class. 
This class captures both the signature and metadata of the external function.
+
+Constructor Details for ``gandiva::NativeFunction``:
+
+.. code-block:: cpp
+
+    NativeFunction(const std::string& base_name, const 
std::vector<std::string>& aliases,
+                   const DataTypeVector& param_types, const DataTypePtr& 
ret_type,
+                   the ResultNullableType& result_nullable_type, std::string 
pc_name,
+                   int32_t flags = 0);
+
+The ``NativeFunction`` class is used to define the metadata for an external 
function. Here is a breakdown of its constructor parameters:
+
+* ``base_name``: The name of the function as it will be used in expressions.
+* ``aliases``: A list of alternative names for the function.
+* ``param_types``: A vector of ``arrow::DataType`` objects representing the 
types of the parameters that the function accepts.
+* ``ret_type``: A ``std::shared_ptr<arrow::DataType>`` representing the return 
type of the function.
+* ``result_nullable_type``: This parameter indicates whether the result can be 
null, based on the nullability of the input arguments. It can take one of the 
following values:
+    * ``ResultNullableType::kResultNullIfNull``: result validity is an 
intersection of the validity of the children.
+    * ``ResultNullableType::kResultNullNever``: result is always valid.
+    * ``ResultNullableType::kResultNullInternal``: result validity depends on 
some internal logic.
+* ``pc_name``: The name of the corresponding precompiled function. 
+  * Typically, this name follows the convention ``{base_name}`` + 
``_{param1_type}`` + ``{param2_type}`` + ... + ``{paramN_type}``. For example, 
if the base name is ``add`` and the function takes two ``int32`` parameters and 
returns an ``int32``, the precompiled function name would be 
``add_int32_int32``, but this convention is not mandatory as long as you can 
guarantee its uniqueness.
+* ``flags``: Optional flags for additional function attributes (default is 0). 
Please check out ``NativeFunction::kNeedsContext``, 
``NativeFunction::kNeedsFunctionHolder``, and 
``NativeFunction::kCanReturnErrors`` for more details.
+
+3.2 External C functions
+------------------------
+
+External C functions can be authored in different languages and exposed as C 
functions. Compatibility with Gandiva's type system is crucial.
+
+3.2.1 C Function Signature
+**************************
+
+3.2.1.1 Signature Mapping
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following table lists the mapping between Gandiva external function 
signature types and the C function signature types:

Review Comment:
   @js8544 I added `time64`, `binary`, `internal_month`, and 
`interval_day_time`, but I am not sure how `decimal` should be represented in C 
functions so it is not added yet. Is there any sample stub functions that has 
decimal parameter I can learn from? Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to