Re: [PR] [#9539] doc(udf): add user doc for Function management [gravitino]

via GitHub Thu, 12 Feb 2026 20:04:20 -0800


Copilot commented on code in PR #9971:
URL: https://github.com/apache/gravitino/pull/9971#discussion_r2802170570



##########
docs/manage-user-defined-function-using-gravitino.md:
##########
@@ -0,0 +1,703 @@
+---
+title: Manage user-defined function using Gravitino

Review Comment:
   Title/link text uses singular “Manage user-defined function using 
Gravitino”, but the page content and other “Manage …” docs typically use 
plurals for collections (e.g., jobs/tags) and this page is about managing 
multiple functions. Consider renaming the title (and the index link text) to 
“Manage user-defined functions using Gravitino” for grammatical consistency 
(slug can remain unchanged).
   ```suggestion
   title: Manage user-defined functions using Gravitino
   ```



##########
docs/spark-connector/spark-connector-udf.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Spark connector - User-defined functions"
+slug: /spark-connector/spark-connector-udf
+keyword: spark connector UDF user-defined function
+license: "This software is licensed under the Apache License version 2."
+---
+
+## Overview
+
+The Apache Gravitino Spark connector supports loading user-defined functions 
(UDFs) registered
+in the Gravitino function registry. Once a function is
+[registered in Gravitino](../manage-user-defined-function-using-gravitino.md), 
Spark can discover and
+invoke it through standard Spark SQL syntax — no additional `CREATE FUNCTION` 
statement is needed.
+
+:::note
+Currently, only **Java implementations** with `RuntimeType.SPARK` are 
supported in the Spark

Review Comment:
   The new Spark UDF doc claims the Gravitino Spark connector can discover and 
invoke Gravitino-registered functions “with no additional CREATE FUNCTION”, but 
the existing Spark connector docs explicitly list “Function operations” / 
“Querying UDF” as not supported (e.g. 
docs/spark-connector/spark-catalog-hive.md), and the Spark connector 
implementation primarily implements TableCatalog (no Gravitino function 
registry integration). Please reconcile this page with current connector 
capabilities (either adjust the claims / mark as future/planned, or update the 
connector docs and implementation if function support has been added).
   ```suggestion
   The Apache Gravitino Spark connector is designed to load user-defined 
functions (UDFs) registered
   in the Gravitino function registry. Once a function is
   [registered in 
Gravitino](../manage-user-defined-function-using-gravitino.md), Spark will be 
able to
   discover and invoke it through standard Spark SQL syntax.
   
   :::note
   This document describes **planned / in-development** functionality. In 
current releases of the
   Spark connector, function operations (including discovering and invoking 
Gravitino-registered UDFs)
   are **not yet supported**, and you must still use standard Spark mechanisms 
such as `CREATE FUNCTION`
   to register and call UDFs.
   
   In addition, only **Java implementations** with `RuntimeType.SPARK` will be 
supported in the Spark
   ```



##########
docs/manage-user-defined-function-using-gravitino.md:
##########
@@ -0,0 +1,703 @@
+---
+title: Manage user-defined function using Gravitino
+slug: /manage-user-defined-function-using-gravitino
+keyword: Gravitino user-defined function UDF manage
+license: This software is licensed under the Apache License version 2.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This page introduces how to manage user-defined functions (UDFs) in Apache 
Gravitino. Gravitino
+provides a centralized function registry that allows you to define custom 
functions once and
+share them across multiple compute engines like Spark and Trino.
+
+A function in Gravitino is characterized by:
+
+- **Name**: The function identifier within a schema.
+- **Function type**: `SCALAR` (row-by-row operations), `AGGREGATE` (group 
operations), or
+  `TABLE` (set-returning operations).
+- **Deterministic**: Whether the function always returns the same result for 
the same input.
+- **Definitions**: One or more overloads, each with a specific parameter list, 
return type
+  (or return columns for table functions), and one or more implementations for 
different
+  runtimes (e.g. Spark, Trino).
+
+Each definition can have multiple implementations in different languages (SQL, 
Java, Python)
+targeting different runtimes. **Each definition must have at most one 
implementation per
+runtime** — for example, you cannot have two implementations both targeting 
`SPARK` in the
+same definition. To replace an existing implementation, use `updateImpl` 
instead of `addImpl`.
+
+| Language | Key fields             | Description                              
            |
+|----------|------------------------|------------------------------------------------------|
+| SQL      | `sql`                  | An inline SQL expression.                
            |
+| Java     | `className`            | Fully qualified Java class name.         
            |
+| Python   | `handler`, `codeBlock` | Python handler entry point and optional 
inline code. |
+
+To use function management, please make sure that:
+
+ - The Gravitino server has started and is serving at, e.g. 
[http://localhost:8090](http://localhost:8090).
+ - A metalake has been created.
+ - A catalog has been created within the metalake.
+ - A schema has been created within the catalog.
+
+## Function operations
+
+### Register a function
+
+You can register a function by sending a `POST` request to the 
`/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions`
+endpoint or just use the Gravitino Java/Python client. The following is an 
example of registering
+a scalar function:
+
+<Tabs groupId="language" queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+  "name": "add_one",
+  "functionType": "SCALAR",
+  "deterministic": true,
+  "comment": "A scalar function that adds one to the input",
+  "definitions": [
+    {
+      "parameters": [
+        {"name": "x", "dataType": "integer"}
+      ],
+      "returnType": "integer",
+      "impls": [
+        {
+          "language": "SQL",
+          "runtime": "SPARK",

Review Comment:
   In the “Register a function” example, the Spark-targeting implementation is 
shown as `language: "SQL"` / `runtime: "SPARK"` (and similarly later overload 
examples use SQL+SPARK). This conflicts with the new Spark connector UDF page 
which states only Java implementations are currently supported in Spark. Please 
either switch the Spark-oriented examples to Java implementations, or add an 
explicit note that SQL/Python impls may be stored in Gravitino but are not 
invokable from Spark yet.
   ```suggestion
             "runtime": "TRINO",
   ```



##########
docs/index.md:
##########
@@ -61,6 +61,8 @@ You can use either to manage metadata. See
   messaging metadata.
 * [Manage model metadata using 
Gravitino](./manage-model-metadata-using-gravitino.md) to learn how to manage
   model metadata.
+* [Manage user-defined function using 
Gravitino](./manage-user-defined-function-using-gravitino.md) to learn how to 
manage

Review Comment:
   This new index entry uses singular “Manage user-defined function …” while 
the linked page is about managing UDFs as a set. Consider changing the link 
text to “Manage user-defined functions using Gravitino” to match the page’s 
scope and improve grammar.
   ```suggestion
   * [Manage user-defined functions using 
Gravitino](./manage-user-defined-function-using-gravitino.md) to learn how to 
manage
   ```



##########
docs/spark-connector/spark-connector-udf.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Spark connector - User-defined functions"
+slug: /spark-connector/spark-connector-udf
+keyword: spark connector UDF user-defined function
+license: "This software is licensed under the Apache License version 2."
+---
+
+## Overview
+
+The Apache Gravitino Spark connector supports loading user-defined functions 
(UDFs) registered
+in the Gravitino function registry. Once a function is
+[registered in Gravitino](../manage-user-defined-function-using-gravitino.md), 
Spark can discover and
+invoke it through standard Spark SQL syntax — no additional `CREATE FUNCTION` 
statement is needed.
+
+:::note
+Currently, only **Java implementations** with `RuntimeType.SPARK` are 
supported in the Spark
+connector. SQL and Python implementations registered in Gravitino cannot yet 
be invoked
+directly from Spark. Support for additional languages is planned for future 
releases.
+:::
+
+## Prerequisites
+
+Before using Gravitino UDFs in Spark, ensure the following:
+
+1. The **Spark connector is configured** and the catalog is accessible
+   (see [Spark connector setup](spark-connector.md)).
+2. The function has been **registered in Gravitino** with at least one 
definition that includes
+   a Java implementation targeting `RuntimeType.SPARK`
+   (see [Register a 
function](../manage-user-defined-function-using-gravitino.md#register-a-function)).
+3. The **JAR containing the UDF class** is available on the Spark classpath 
(e.g. via
+   `--jars` or `spark.jars` configuration).
+
+## Java UDF requirements
+
+The Java class specified in `className` of the function implementation must 
implement Spark's
+`org.apache.spark.sql.connector.catalog.functions.UnboundFunction` interface. 
For details on
+implementing custom Spark functions, refer to the
+[Spark DataSource V2 Functions 
documentation](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/connector/catalog/functions/UnboundFunction.html).
+
+Key points:
+
+- The class must have a **public no-arg constructor**.
+- The class must be on the **Spark driver and executor classpath**.
+- Only functions with `RuntimeType.SPARK` are visible to the Spark connector; 
implementations
+  targeting other runtimes (e.g. `TRINO`) are filtered out.
+
+## Calling functions in Spark SQL
+
+Use the fully qualified three-part name `catalog.schema.function_name` to call 
a
+Gravitino-registered function:
+
+```sql
+-- Call a scalar function
+SELECT my_catalog.my_schema.add_one(42);
+
+-- Use in a query
+SELECT id, my_catalog.my_schema.add_one(value) AS incremented
+FROM my_catalog.my_schema.my_table;
+```
+
+:::tip
+You can simplify the syntax by setting the default catalog and schema first:
+
+```sql
+USE my_catalog.my_schema;

Review Comment:
   The examples use `USE my_catalog.my_schema;` to set defaults. This differs 
from other Spark connector docs in this repo (which use separate `USE 
<catalog>;` then `USE <schema>;`) and may not work consistently across Spark 
versions. Consider changing the tip to the same pattern used elsewhere (or 
explicitly use `USE CATALOG` + `USE` if that’s the intended syntax).
   ```suggestion
   USE my_catalog;
   USE my_schema;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#9539] doc(udf): add user doc for Function management [gravitino]

Reply via email to