stevenzwu commented on code in PR #14117:
URL: https://github.com/apache/iceberg/pull/14117#discussion_r2427391049


##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).

Review Comment:
   I thought we agreed to change it to `Scalar functions (UDFs)` in Dan's 
comment. Scalar function is a pretty common term (used in Spark, Flink, Trino.



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |
+| *optional*  | `location`        | `string`                | Storage location 
of metadata files.                                                              
                        |
+| *optional*  | `properties`      | `map`                   | Arbitrary 
key–value pairs.                                                                
                               |

Review Comment:
   nit on description
   ```
   A string to string map of UDF properties
   ```



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |
+| *optional*  | `location`        | `string`                | Storage location 
of metadata files.                                                              
                        |
+| *optional*  | `properties`      | `map`                   | Arbitrary 
key–value pairs.                                                                
                               |
+| *optional*  | `secure`          | `boolean`               | Whether it is a 
secure function. Default: `false`.                                              
                         |
+| *optional*  | `doc`             | `string`                | Documentation 
string.                                                                         
                           |
+
+Notes:
+1. When `secure` is `true`,
+    - Engines **SHOULD NOT** expose the function definition through any 
inspection (e.g., `SHOW FUNCTIONS`).
+    - Engines **SHOULD** ensure that execution does not leak sensitive 
information through any channels, such as error messages, logs, or query plans.
+   
+### Overload
+
+Function overloads allow multiple implementations of the same function name 
with different signatures. Each overload has
+the following fields:
+
+| Requirement | Field name                 | Type                              
            | Description                                                       
                            |
+|-------------|----------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------|
+| *required*  | `overload-id`              | `long`                            
            | Monotonically increasing identifier of this function overload.    
                            |
+| *required*  | `parameters`               | `array<parameter>`                
            | Ordered list of [function parameters](#parameter). Invocation 
order **must** match this list. |
+| *required*  | `return-type`              | `Type` (Iceberg data type; 
`struct` for UDTF) | Return type. Example: `"string"`, `"struct<...>"`.         
                                   |

Review Comment:
   struct is also an Iceberg field type. can it also be for scalar function 
(struct -> struct)?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |
+| *optional*  | `location`        | `string`                | Storage location 
of metadata files.                                                              
                        |
+| *optional*  | `properties`      | `map`                   | Arbitrary 
key–value pairs.                                                                
                               |
+| *optional*  | `secure`          | `boolean`               | Whether it is a 
secure function. Default: `false`.                                              
                         |
+| *optional*  | `doc`             | `string`                | Documentation 
string.                                                                         
                           |
+
+Notes:
+1. When `secure` is `true`,
+    - Engines **SHOULD NOT** expose the function definition through any 
inspection (e.g., `SHOW FUNCTIONS`).
+    - Engines **SHOULD** ensure that execution does not leak sensitive 
information through any channels, such as error messages, logs, or query plans.
+   
+### Overload
+
+Function overloads allow multiple implementations of the same function name 
with different signatures. Each overload has
+the following fields:
+
+| Requirement | Field name                 | Type                              
            | Description                                                       
                            |
+|-------------|----------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------|
+| *required*  | `overload-id`              | `long`                            
            | Monotonically increasing identifier of this function overload.    
                            |
+| *required*  | `parameters`               | `array<parameter>`                
            | Ordered list of [function parameters](#parameter). Invocation 
order **must** match this list. |
+| *required*  | `return-type`              | `Type` (Iceberg data type; 
`struct` for UDTF) | Return type. Example: `"string"`, `"struct<...>"`.         
                                   |
+| *required*  | `versions`                 | `array<overload-version>`         
            | [Versioned implementations](#overload-version) of this overload.  
                            |
+| *required*  | `current-overload-version` | `long`                            
            | Monotonically increasing identifier of the current overload 
version.                          |
+| *optional*  | `function-type`            | `enum { "udf", "udtf" }` (default 
`"udf"`)    | If `"udtf"`, `return-type` must be a `struct` describing the 
output schema.                   |

Review Comment:
   there is no enum type in Iceberg. I guess we should use `string` here. we 
can clarify the valid values as `udf` and `udtf` in the type column or the 
description column.
   
   e.g. the table spec has the following definition for data file `content` type
   ```
   int with meaning: 0: DATA, 1: POSITION DELETES, 2: EQUALITY DELETES
   ```



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |
+| *optional*  | `location`        | `string`                | Storage location 
of metadata files.                                                              
                        |
+| *optional*  | `properties`      | `map`                   | Arbitrary 
key–value pairs.                                                                
                               |
+| *optional*  | `secure`          | `boolean`               | Whether it is a 
secure function. Default: `false`.                                              
                         |
+| *optional*  | `doc`             | `string`                | Documentation 
string.                                                                         
                           |
+
+Notes:
+1. When `secure` is `true`,
+    - Engines **SHOULD NOT** expose the function definition through any 
inspection (e.g., `SHOW FUNCTIONS`).
+    - Engines **SHOULD** ensure that execution does not leak sensitive 
information through any channels, such as error messages, logs, or query plans.
+   
+### Overload
+
+Function overloads allow multiple implementations of the same function name 
with different signatures. Each overload has
+the following fields:
+
+| Requirement | Field name                 | Type                              
            | Description                                                       
                            |
+|-------------|----------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------|
+| *required*  | `overload-id`              | `long`                            
            | Monotonically increasing identifier of this function overload.    
                            |
+| *required*  | `parameters`               | `array<parameter>`                
            | Ordered list of [function parameters](#parameter). Invocation 
order **must** match this list. |

Review Comment:
   Since Iceberg data type uses list, should we use `list` instead of `array` 
here?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 

Review Comment:
   > can be moved across catalogs
   
   what does this mean?



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |
+| *optional*  | `location`        | `string`                | Storage location 
of metadata files.                                                              
                        |
+| *optional*  | `properties`      | `map`                   | Arbitrary 
key–value pairs.                                                                
                               |
+| *optional*  | `secure`          | `boolean`               | Whether it is a 
secure function. Default: `false`.                                              
                         |
+| *optional*  | `doc`             | `string`                | Documentation 
string.                                                                         
                           |
+
+Notes:
+1. When `secure` is `true`,
+    - Engines **SHOULD NOT** expose the function definition through any 
inspection (e.g., `SHOW FUNCTIONS`).
+    - Engines **SHOULD** ensure that execution does not leak sensitive 
information through any channels, such as error messages, logs, or query plans.
+   
+### Overload
+
+Function overloads allow multiple implementations of the same function name 
with different signatures. Each overload has
+the following fields:
+
+| Requirement | Field name                 | Type                              
            | Description                                                       
                            |
+|-------------|----------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------|
+| *required*  | `overload-id`              | `long`                            
            | Monotonically increasing identifier of this function overload.    
                            |
+| *required*  | `parameters`               | `array<parameter>`                
            | Ordered list of [function parameters](#parameter). Invocation 
order **must** match this list. |
+| *required*  | `return-type`              | `Type` (Iceberg data type; 
`struct` for UDTF) | Return type. Example: `"string"`, `"struct<...>"`.         
                                   |
+| *required*  | `versions`                 | `array<overload-version>`         
            | [Versioned implementations](#overload-version) of this overload.  
                            |
+| *required*  | `current-overload-version` | `long`                            
            | Monotonically increasing identifier of the current overload 
version.                          |
+| *optional*  | `function-type`            | `enum { "udf", "udtf" }` (default 
`"udf"`)    | If `"udtf"`, `return-type` must be a `struct` describing the 
output schema.                   |
+| *optional*  | `doc`                      | `string`                          
            | Documentation string.                                             
                            |
+
+### Parameter
+| Requirement | Field  | Type     | Description              |
+|-------------|--------|----------|--------------------------|
+| *required*  | `name` | `string` | Parameter name.          |
+| *required*  | `type` | `Type`   | Parameter data type.     |
+| *optional*  | `doc`  | `string` | Parameter documentation. |
+
+Notes:
+1. The `name` and `type` of a `parameter` are immutable. To change them, a new 
overload must be created. Only the optional documentation field (`doc`) can be 
updated in-place.

Review Comment:
   should `name` be immutable? typically function signature (like Java) doesn't 
include parameter name



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |

Review Comment:
   > Highest `overload-id` currently assigned in this metadata file
   
   nit: Is this more clear?
   ```
   Highest `overload-id` currently assigned for this UDF
   ```



##########
format/udf-spec.md:
##########
@@ -0,0 +1,285 @@
+---
+title: "SQL UDF Spec"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg SQL UDF Spec
+
+## Background and Motivation
+
+A SQL user-defined function (UDF or UDTF) is a callable routine that accepts 
input parameters, executes a function body.
+Depending on the function type, the result can be:
+
+- **UDFs** – return a scalar value, which may be a primitive type (e.g., 
`int`, `string`) or a non-primitive type (e.g., `struct`, `list`).
+- **Table functions (UDTFs)** – return a table, i.e., a table with zero or 
more rows and columns with a uniform schema.
+
+Many compute engines (e.g., Spark, Trino) already support UDFs, but in 
different and incompatible ways. Without a common
+standard, UDFs cannot be reliably shared across engines or reused in 
multi-engine environments.
+
+This specification introduces a standardized metadata format for UDFs in 
Iceberg. 
+
+## Goals
+
+* Define a portable metadata format for both scalar and table SQL UDFs. The 
metadata is self-contained and can be moved across catalogs. 
+* Support function evolution through versioning and rollback.
+* Provide consistent semantics for representing UDFs across engines.
+
+## Overview
+
+UDF metadata follows the same design principles as Iceberg table and view 
metadata: each function is represented by a
+**self-contained metadata file**. Metadata captures definitions, parameters, 
return types, documentation, security,
+properties, and engine-specific representations.
+
+* Any modification (new overload, updated representation, changed properties, 
etc.) creates a new metadata file, and atomically swaps in the new file as the 
current metadata.
+* Each metadata file includes recent definition versions, enabling rollbacks 
without external state.
+
+## Specification
+
+### UDF Metadata
+The UDF metadata file has the following fields:
+
+| Requirement | Field name        | Type                    | Description      
                                                                                
                        |
+|-------------|-------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| *required*  | `function-uuid`   | `string`                | A UUID that 
identifies the function, generated once at creation.                            
                             |
+| *required*  | `format-version`  | `int`                   | Metadata format 
version (must be `1`).                                                          
                         |
+| *required*  | `definitions`     | `array<overload>`       | List of function 
[overload](#overload) entities.                                                 
                        |
+| *required*  | `definition-log`  | `array<definition-log>` | History of 
[definition snapshots](#definition-log).                                        
                              |
+| *required*  | `max-overload-id` | `long`                  | Highest 
`overload-id` currently assigned in this metadata file. Used to allocate new 
overload identifiers monotonically. |
+| *optional*  | `location`        | `string`                | Storage location 
of metadata files.                                                              
                        |
+| *optional*  | `properties`      | `map`                   | Arbitrary 
key–value pairs.                                                                
                               |
+| *optional*  | `secure`          | `boolean`               | Whether it is a 
secure function. Default: `false`.                                              
                         |
+| *optional*  | `doc`             | `string`                | Documentation 
string.                                                                         
                           |
+
+Notes:
+1. When `secure` is `true`,
+    - Engines **SHOULD NOT** expose the function definition through any 
inspection (e.g., `SHOW FUNCTIONS`).
+    - Engines **SHOULD** ensure that execution does not leak sensitive 
information through any channels, such as error messages, logs, or query plans.
+   
+### Overload
+
+Function overloads allow multiple implementations of the same function name 
with different signatures. Each overload has
+the following fields:
+
+| Requirement | Field name                 | Type                              
            | Description                                                       
                            |
+|-------------|----------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------|
+| *required*  | `overload-id`              | `long`                            
            | Monotonically increasing identifier of this function overload.    
                            |
+| *required*  | `parameters`               | `array<parameter>`                
            | Ordered list of [function parameters](#parameter). Invocation 
order **must** match this list. |
+| *required*  | `return-type`              | `Type` (Iceberg data type; 
`struct` for UDTF) | Return type. Example: `"string"`, `"struct<...>"`.         
                                   |
+| *required*  | `versions`                 | `array<overload-version>`         
            | [Versioned implementations](#overload-version) of this overload.  
                            |
+| *required*  | `current-overload-version` | `long`                            
            | Monotonically increasing identifier of the current overload 
version.                          |
+| *optional*  | `function-type`            | `enum { "udf", "udtf" }` (default 
`"udf"`)    | If `"udtf"`, `return-type` must be a `struct` describing the 
output schema.                   |
+| *optional*  | `doc`                      | `string`                          
            | Documentation string.                                             
                            |
+
+### Parameter
+| Requirement | Field  | Type     | Description              |
+|-------------|--------|----------|--------------------------|
+| *required*  | `name` | `string` | Parameter name.          |
+| *required*  | `type` | `Type`   | Parameter data type.     |
+| *optional*  | `doc`  | `string` | Parameter documentation. |
+
+Notes:
+1. The `name` and `type` of a `parameter` are immutable. To change them, a new 
overload must be created. Only the optional documentation field (`doc`) can be 
updated in-place.
+2. The `return-type` is immutable. To change it, users must create a new 
overload and deprecate or remove the old one.
+
+### Overload-Version
+Each overload can evolve over time by introducing new versions. An overload 
version represents a specific implementation
+of the overload at a given point in time.
+
+| Requirement | Field name            | Type                                   
                                  | Description                                 
                                            |
+|-------------|-----------------------|--------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
+| *required*  | `overload-version-id` | `long`                                 
                                  | Monotonically increasing identifier of the 
overload version.                            |
+| *required*  | `representations`     | `array<representation>`                
                                  | [Dialect-specific 
implementations](#representation).                                    |
+| *optional*  | `deterministic`       | `boolean` (default `false`)            
                                  | Whether the function is deterministic.      
                                            |
+| *optional*  | `null-handling`       | `enum { "returns_null", 
"called_on_null" }` (default `"called_on_null"`) | Hint describing how the 
function behaves with NULL input values. See below for details. |
+| *required*  | `timestamp-ms`        | `long` (epoch millis)                  
                                  | Creation timestamp of this version.         
                                            |
+
+Note:
+
+`null-handling` provides an optimization hint for query engines:

Review Comment:
   do we need to call out these two are mutually exclusive in the notes here or 
the description column?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to