rraulinio opened a new issue, #1360: URL: https://github.com/apache/iceberg-go/issues/1360
### Feature Request / Improvement # Add SQL UDF metadata model and REST read support Spec: https://iceberg.apache.org/udf-spec/ ## Background Apache Iceberg 1.11.0 introduced a **SQL UDF specification**. In plain terms, a catalog UDF is a named, reusable SQL routine stored in the catalog. A scalar UDF is a reusable expression, and a UDTF is closer to a parameterized view that returns rows. For example, a catalog can define `add_one(x int)` once, and engines such as Spark or Trino can discover it from the catalog and use the SQL representation written for their dialect. The metadata design deliberately mirrors Iceberg tables and views: each function is represented by one self-contained, immutable JSON metadata file. The catalog maps the function name to the current metadata file location and updates that mapping with an atomic swap. All overloads of a name live in a single metadata file, each overload has its own version history, and a `definition-log` records the selected definition-version mappings over time so the function can be rolled back without external state. ## What this unlocks - Go clients can discover and read catalog-managed UDFs from REST catalogs that serve them. - Go-based catalog servers and tools get a shared, spec-validated model for UDF metadata instead of each reinventing JSON handling, similar to the role the existing `view` package plays for views. ## Proposed decomposition - two PRs The split keeps review focused: **PR 1 is reviewable purely against the UDF format spec** with no HTTP involved, and **PR 2 is reviewable purely against the REST OpenAPI**. PR 2 depends on PR 1. - [ ] **PR 1 - `udf` metadata model package.** Add a new package named `udf`, matching Java's `org.apache.iceberg.udf` package and the spec title. The package would mirror the structure of the existing `view` package: metadata model (`function-uuid`, `format-version`, `definitions`, `definition-log`, `location`, `properties`, `secure`, `doc`), definitions (`parameters`, `return-type`, `function-type` `udf`/`udtf`, `versions`, `current-version-id`, optional `return-nullable`, optional `doc`, and optional `specific-name`), definition versions (`representations`, `deterministic`, `on-null-input`, `timestamp-ms`), SQL representations (one per dialect per version), an unknown representation fallback for forward-compatible round-tripping, parsing/serialization, validation of the spec invariants, and a metadata builder for writers. Important validation points include one definition per signature, `current-version-id` referencing an existing version, UDTF `return-type` being a struct, unique `specific-name` values when present, and the canonical `definition-id` serializer for parameter types, for example: `int,list<int>,struct<id:int,name:string>`. `specific-name` is present in the current published UDF spec on Iceberg main/latest and was added after the 1.11.0 tag. Because it is optional and additive, the model can include it while older metadata simply omits it. One design note: UDF types are Iceberg types **without field IDs**. The spec says extra fields must be ignored, while iceberg-go's schema JSON parsing expects field IDs. The package should therefore carry a small UDF type representation (`primitive string | list | map | struct`). The spec's Appendix A and Appendix B examples should become golden test fixtures. - [ ] **PR 2 - REST catalog read-side client.** Add the REST read-side function support from the current REST OpenAPI: - `GET /v1/{prefix}/namespaces/{namespace}/functions` - `GET /v1/{prefix}/namespaces/{namespace}/functions/{function}` These should be gated by existing capability discovery. Per the REST spec, function endpoints are **not** part of the assumed default endpoint set, so they should only be used when the server advertises them in `ConfigResponse.endpoints`. Add an optional `FunctionCatalog` capability interface, following the same type-assertion pattern used for optional catalog capabilities such as `TransactionalCatalog` and `PurgeableTable`. The initial read-side surface would include: - `ListFunctions`: paginated iterator, similar to `ListViews`. - `LoadFunction`: returns the identifier, parsed metadata, and optional metadata location. - `CheckFunctionExists`: implemented via `LoadFunction` / GET fallback, since the REST spec defines no HEAD endpoint for functions. Wire-shape notes for reviewers: - `listFunctions` returns `CatalogObjectIdentifier`, which is a flat string array of hierarchy levels such as `["accounting", "tax", "paid"]`, not the `{namespace, name}` object shape used by tables/views. - `loadFunction` returns all overloads in one metadata response. - Add a new sentinel `ErrNoSuchFunction`, mapped from the server's `NoSuchFunctionException` 404 response. ## Non-goals for now - **No create/replace/drop client methods.** There are no standardized REST write endpoints to call yet. Adding non-spec write calls would diverge from upstream. PR 1's metadata builder gives writers what they need; client CRUD can follow once the upstream REST write proposal lands. - **No UDF execution.** Resolving overloads and running the SQL body is the query engine's job. This issue is only about catalog metadata modeling and REST read-side plumbing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
