Copilot commented on code in PR #9971:
URL: https://github.com/apache/gravitino/pull/9971#discussion_r2802170570
##########
docs/manage-user-defined-function-using-gravitino.md:
##########
@@ -0,0 +1,703 @@
+---
+title: Manage user-defined function using Gravitino
Review Comment:
Title/link text uses singular “Manage user-defined function using
Gravitino”, but the page content and other “Manage …” docs typically use
plurals for collections (e.g., jobs/tags) and this page is about managing
multiple functions. Consider renaming the title (and the index link text) to
“Manage user-defined functions using Gravitino” for grammatical consistency
(slug can remain unchanged).
```suggestion
title: Manage user-defined functions using Gravitino
```
##########
docs/spark-connector/spark-connector-udf.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Spark connector - User-defined functions"
+slug: /spark-connector/spark-connector-udf
+keyword: spark connector UDF user-defined function
+license: "This software is licensed under the Apache License version 2."
+---
+
+## Overview
+
+The Apache Gravitino Spark connector supports loading user-defined functions
(UDFs) registered
+in the Gravitino function registry. Once a function is
+[registered in Gravitino](../manage-user-defined-function-using-gravitino.md),
Spark can discover and
+invoke it through standard Spark SQL syntax — no additional `CREATE FUNCTION`
statement is needed.
+
+:::note
+Currently, only **Java implementations** with `RuntimeType.SPARK` are
supported in the Spark
Review Comment:
The new Spark UDF doc claims the Gravitino Spark connector can discover and
invoke Gravitino-registered functions “with no additional CREATE FUNCTION”, but
the existing Spark connector docs explicitly list “Function operations” /
“Querying UDF” as not supported (e.g.
docs/spark-connector/spark-catalog-hive.md), and the Spark connector
implementation primarily implements TableCatalog (no Gravitino function
registry integration). Please reconcile this page with current connector
capabilities (either adjust the claims / mark as future/planned, or update the
connector docs and implementation if function support has been added).
```suggestion
The Apache Gravitino Spark connector is designed to load user-defined
functions (UDFs) registered
in the Gravitino function registry. Once a function is
[registered in
Gravitino](../manage-user-defined-function-using-gravitino.md), Spark will be
able to
discover and invoke it through standard Spark SQL syntax.
:::note
This document describes **planned / in-development** functionality. In
current releases of the
Spark connector, function operations (including discovering and invoking
Gravitino-registered UDFs)
are **not yet supported**, and you must still use standard Spark mechanisms
such as `CREATE FUNCTION`
to register and call UDFs.
In addition, only **Java implementations** with `RuntimeType.SPARK` will be
supported in the Spark
```
##########
docs/manage-user-defined-function-using-gravitino.md:
##########
@@ -0,0 +1,703 @@
+---
+title: Manage user-defined function using Gravitino
+slug: /manage-user-defined-function-using-gravitino
+keyword: Gravitino user-defined function UDF manage
+license: This software is licensed under the Apache License version 2.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This page introduces how to manage user-defined functions (UDFs) in Apache
Gravitino. Gravitino
+provides a centralized function registry that allows you to define custom
functions once and
+share them across multiple compute engines like Spark and Trino.
+
+A function in Gravitino is characterized by:
+
+- **Name**: The function identifier within a schema.
+- **Function type**: `SCALAR` (row-by-row operations), `AGGREGATE` (group
operations), or
+ `TABLE` (set-returning operations).
+- **Deterministic**: Whether the function always returns the same result for
the same input.
+- **Definitions**: One or more overloads, each with a specific parameter list,
return type
+ (or return columns for table functions), and one or more implementations for
different
+ runtimes (e.g. Spark, Trino).
+
+Each definition can have multiple implementations in different languages (SQL,
Java, Python)
+targeting different runtimes. **Each definition must have at most one
implementation per
+runtime** — for example, you cannot have two implementations both targeting
`SPARK` in the
+same definition. To replace an existing implementation, use `updateImpl`
instead of `addImpl`.
+
+| Language | Key fields | Description
|
+|----------|------------------------|------------------------------------------------------|
+| SQL | `sql` | An inline SQL expression.
|
+| Java | `className` | Fully qualified Java class name.
|
+| Python | `handler`, `codeBlock` | Python handler entry point and optional
inline code. |
+
+To use function management, please make sure that:
+
+ - The Gravitino server has started and is serving at, e.g.
[http://localhost:8090](http://localhost:8090).
+ - A metalake has been created.
+ - A catalog has been created within the metalake.
+ - A schema has been created within the catalog.
+
+## Function operations
+
+### Register a function
+
+You can register a function by sending a `POST` request to the
`/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions`
+endpoint or just use the Gravitino Java/Python client. The following is an
example of registering
+a scalar function:
+
+<Tabs groupId="language" queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+ "name": "add_one",
+ "functionType": "SCALAR",
+ "deterministic": true,
+ "comment": "A scalar function that adds one to the input",
+ "definitions": [
+ {
+ "parameters": [
+ {"name": "x", "dataType": "integer"}
+ ],
+ "returnType": "integer",
+ "impls": [
+ {
+ "language": "SQL",
+ "runtime": "SPARK",
Review Comment:
In the “Register a function” example, the Spark-targeting implementation is
shown as `language: "SQL"` / `runtime: "SPARK"` (and similarly later overload
examples use SQL+SPARK). This conflicts with the new Spark connector UDF page
which states only Java implementations are currently supported in Spark. Please
either switch the Spark-oriented examples to Java implementations, or add an
explicit note that SQL/Python impls may be stored in Gravitino but are not
invokable from Spark yet.
```suggestion
"runtime": "TRINO",
```
##########
docs/index.md:
##########
@@ -61,6 +61,8 @@ You can use either to manage metadata. See
messaging metadata.
* [Manage model metadata using
Gravitino](./manage-model-metadata-using-gravitino.md) to learn how to manage
model metadata.
+* [Manage user-defined function using
Gravitino](./manage-user-defined-function-using-gravitino.md) to learn how to
manage
Review Comment:
This new index entry uses singular “Manage user-defined function …” while
the linked page is about managing UDFs as a set. Consider changing the link
text to “Manage user-defined functions using Gravitino” to match the page’s
scope and improve grammar.
```suggestion
* [Manage user-defined functions using
Gravitino](./manage-user-defined-function-using-gravitino.md) to learn how to
manage
```
##########
docs/spark-connector/spark-connector-udf.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Spark connector - User-defined functions"
+slug: /spark-connector/spark-connector-udf
+keyword: spark connector UDF user-defined function
+license: "This software is licensed under the Apache License version 2."
+---
+
+## Overview
+
+The Apache Gravitino Spark connector supports loading user-defined functions
(UDFs) registered
+in the Gravitino function registry. Once a function is
+[registered in Gravitino](../manage-user-defined-function-using-gravitino.md),
Spark can discover and
+invoke it through standard Spark SQL syntax — no additional `CREATE FUNCTION`
statement is needed.
+
+:::note
+Currently, only **Java implementations** with `RuntimeType.SPARK` are
supported in the Spark
+connector. SQL and Python implementations registered in Gravitino cannot yet
be invoked
+directly from Spark. Support for additional languages is planned for future
releases.
+:::
+
+## Prerequisites
+
+Before using Gravitino UDFs in Spark, ensure the following:
+
+1. The **Spark connector is configured** and the catalog is accessible
+ (see [Spark connector setup](spark-connector.md)).
+2. The function has been **registered in Gravitino** with at least one
definition that includes
+ a Java implementation targeting `RuntimeType.SPARK`
+ (see [Register a
function](../manage-user-defined-function-using-gravitino.md#register-a-function)).
+3. The **JAR containing the UDF class** is available on the Spark classpath
(e.g. via
+ `--jars` or `spark.jars` configuration).
+
+## Java UDF requirements
+
+The Java class specified in `className` of the function implementation must
implement Spark's
+`org.apache.spark.sql.connector.catalog.functions.UnboundFunction` interface.
For details on
+implementing custom Spark functions, refer to the
+[Spark DataSource V2 Functions
documentation](https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/connector/catalog/functions/UnboundFunction.html).
+
+Key points:
+
+- The class must have a **public no-arg constructor**.
+- The class must be on the **Spark driver and executor classpath**.
+- Only functions with `RuntimeType.SPARK` are visible to the Spark connector;
implementations
+ targeting other runtimes (e.g. `TRINO`) are filtered out.
+
+## Calling functions in Spark SQL
+
+Use the fully qualified three-part name `catalog.schema.function_name` to call
a
+Gravitino-registered function:
+
+```sql
+-- Call a scalar function
+SELECT my_catalog.my_schema.add_one(42);
+
+-- Use in a query
+SELECT id, my_catalog.my_schema.add_one(value) AS incremented
+FROM my_catalog.my_schema.my_table;
+```
+
+:::tip
+You can simplify the syntax by setting the default catalog and schema first:
+
+```sql
+USE my_catalog.my_schema;
Review Comment:
The examples use `USE my_catalog.my_schema;` to set defaults. This differs
from other Spark connector docs in this repo (which use separate `USE
<catalog>;` then `USE <schema>;`) and may not work consistently across Spark
versions. Consider changing the tip to the same pattern used elsewhere (or
explicitly use `USE CATALOG` + `USE` if that’s the intended syntax).
```suggestion
USE my_catalog;
USE my_schema;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]