flyrain commented on code in PR #14117:
URL: https://github.com/apache/iceberg/pull/14117#discussion_r2636649346


##########
format/udf-spec.md:
##########
@@ -0,0 +1,402 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters and executes a function body.
+Depending on the function type, the result can be:
+
+- **Scalar function (UDF)** – returns a scalar value, which may be a primitive 
type (e.g., `int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table function (UDTF)** – returns a table with zero or more rows of 
columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg.
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs.
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new definition, updated representation, changed 
properties, etc.) creates a new metadata file, and atomically swaps in the new 
file as the current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                   | Description       
                                                                                
              |
+|-------------|-------------------|------------------------|-----------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`               | A UUID that 
identifies the function, generated once at creation.                            
                    |
+| *required*  | `format-version`  | `int`                  | Metadata format 
version (must be `1`).                                                          
                |
+| *required*  | `definitions`     | `list<definition>`     | List of function 
[definition](#definition) entities.                                             
               |
+| *required*  | `definition-log`  | `list<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                     |
+| *required*  | `parameter-names` | `list<parameter-name>` | Global ordered 
parameter names shared across all overloads. Overloads must use a prefix of 
this list, in order. |
+| *optional*  | `location`        | `string`               | Storage location 
of metadata files.                                                              
               |
+| *optional*  | `properties`      | `map<string,string>`   | A 
string-to-string map of properties.                                             
                              |
+| *optional*  | `secure`          | `boolean`              | Whether it is a 
secure function. Default: `false`.                                              
                |
+| *optional*  | `doc`             | `string`               | Documentation 
string.                                                                         
                  |
+
+### Parameter-Name
+| Requirement | Field  | Type     | Description              |
+|-------------|--------|----------|--------------------------|
+| *required*  | `name` | `string` | Parameter name.          |
+| *optional*  | `doc`  | `string` | Parameter documentation. |
+
+Notes:
+1. When `secure` is `true`:
+   - Engines MUST NOT expose the function definition or its body through any 
form of metadata inspection (e.g., `SHOW FUNCTIONS`).
+   - Engines MUST prevent leakage of sensitive information during execution 
via error messages, logs, query plans, or intermediate results.
+   - Engines MUST NOT perform predicate reordering, short-circuiting, or other 
optimizations that could change the order or scope of data access.
+2. Entries in `properties` are treated as hints, not strict rules. Engines MAY 
choose to honor them or ignore them.
+3. `parameter-names` is the source of truth for parameter naming across all 
overload definitions, which makes named-argument
+   invocation consistent across definitions:
+   - Each overload uses the first N entries of this list for its arity, in 
order.
+   - Names and their relative ordering are immutable. Only appending new names 
is allowed.
+   - Only the `doc` field may be updated in place.
+   - Overloads that differ only by parameter types are allowed (e.g., 
`foo(int)`, `foo(float)`, `foo(string, string)`), and
+     they all derive their parameter names positionally from the same 
`parameter-names` prefix.
+
+### Definition
+
+Each `definition` represents one function signature (e.g., `add_one(int)` vs 
`add_one(float)`).
+
+| Requirement | Field name           | Type                                    
                                                                                
                                                                                
                                                    | Description               
                                                                                
                                                                                
          |
+|-------------|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `definition-id`      | `string`                                
                                                                                
                                                                                
                                                    | An identifier derived 
from canonical parameter-type tuple (lowercase, no spaces; e.g., 
`"(int,int,string)"`). If longer than 128 chars, use hashed form 
`"sig1-<base32(SHA-256(signature))[:26]>"`. |

Review Comment:
   Removed the hash part per consensus in the community sync.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to