stevenzwu commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r3370074048
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -4387,338 +4367,393 @@ components:
items:
$ref: '#/components/schemas/TableIdentifier'
- ListFunctionsResponse:
+ ListNamespacesResponse:
type: object
properties:
next-page-token:
$ref: '#/components/schemas/PageToken'
- identifiers:
+ namespaces:
type: array
uniqueItems: true
items:
- $ref: '#/components/schemas/CatalogObjectIdentifier'
+ $ref: '#/components/schemas/Namespace'
- ListNamespacesResponse:
+ UpdateNamespacePropertiesResponse:
type: object
+ required:
+ - updated
+ - removed
properties:
- next-page-token:
- $ref: '#/components/schemas/PageToken'
- namespaces:
+ updated:
+ description: List of property keys that were added or updated
type: array
uniqueItems: true
items:
- $ref: '#/components/schemas/Namespace'
-
- LoadFunctionResult:
- description: |
- Result returned when a function is loaded from the catalog.
-
+ type: string
+ removed:
+ description: List of properties that were removed
+ type: array
+ items:
+ type: string
+ missing:
+ type: array
+ items:
+ type: string
+ description:
+ List of properties requested for removal that were not found
+ in the namespace's properties. Represents a partial success
response.
+ Server's do not need to implement this.
+ nullable: true
- The function metadata JSON is returned in the `metadata` field. The
location of the metadata
- file is returned in the `metadata-location` field, if available.
+ CommitTableResponse:
type: object
required:
+ - metadata-location
- metadata
properties:
metadata-location:
type: string
metadata:
- $ref: '#/components/schemas/FunctionMetadata'
-
- FunctionMetadata:
- description: |
- Portable UDF metadata format.
-
+ $ref: '#/components/schemas/TableMetadata'
- Each function is represented by a self-contained metadata file. The
`format-version` field
- identifies the UDF metadata format.
+ QueryEventsResponse:
type: object
required:
- - function-uuid
- - format-version
- - definitions
- - definition-log
+ - continuation-token
+ - events
properties:
- function-uuid:
- type: string
- format: uuid
- description: A UUID that identifies this UDF, generated once at
creation.
- format-version:
- type: integer
- description: UDF specification format version (must be 1).
- minimum: 1
- maximum: 1
- definitions:
- type: array
- description: List of function definition entities.
- items:
- $ref: '#/components/schemas/FunctionDefinition'
- definition-log:
+ continuation-token:
+ type: string
+ description: >
+ An opaque continuation token to fetch the next page of events.
+ This token encodes the server's cursor position and filter state.
+ Clients should treat this as an opaque value and pass it
unmodified in subsequent requests.
+ When no more events are currently available, the server returns an
empty `events` array
+ and a `continuation-token` that the client can re-issue later to
receive events that
+ occur after this point.
+ events:
type: array
- description: History of versions within the function's definitions.
items:
- $ref: '#/components/schemas/FunctionDefinitionLogEntry'
- location:
- type: string
- description: The function's base location. This is used to store
function metadata files.
- properties:
- type: object
- description: A string-to-string map of properties.
- additionalProperties:
- type: string
- secure:
- type: boolean
- description: Whether it is a secure function.
- default: false
- doc:
- type: string
- description: Documentation string.
+ $ref: "#/components/schemas/Event"
- FunctionDefinition:
+ Event:
type: object
required:
- - definition-id
- - parameters
- - return-type
- - versions
- - current-version-id
- - function-type
+ - event-id
+ - request-id
+ - request-event-count
+ - timestamp-ms
+ - operation
properties:
- definition-id:
+ event-id:
type: string
- description: A canonical string derived from the parameter types,
formatted as a comma-separated list with no spaces.
- parameters:
- type: array
- description: Ordered list of function parameters. Invocation order
must match this list.
- items:
- $ref: '#/components/schemas/FunctionParameter'
- return-type:
- $ref: '#/components/schemas/FunctionDataType'
- return-nullable:
- type: boolean
- description: A hint to indicate whether the return value is nullable
or not.
- default: true
- versions:
- type: array
- description: Versioned implementations of this definition.
- items:
- $ref: '#/components/schemas/FunctionDefinitionVersion'
- current-version-id:
- type: integer
- description: Identifier of the current version for this definition.
- function-type:
- type: string
- description: Function type. When set to "udtf", "return-type" must
be a struct describing the output schema.
- enum: ["udf", "udtf"]
- doc:
+ description: Unique ID of this event. Clients should perform
deduplication based on this ID.
+ request-id:
+ description: >
+ Opaque ID of the request this change belongs to.
+ This ID can be used to identify events that were part of the same
request.
type: string
- description: Documentation string.
-
- FunctionParameter:
+ request-event-count:
+ type: integer
+ description: >
+ Number of events produced by the originating catalog request (e.g.
updateTable
+ or commitTransaction) that generated this event. Such requests can
apply multiple
+ updates atomically, each surfaced as a separate event sharing the
same `request-id`;
+ this count reports how many events that originating request
produced in total.
+ timestamp-ms:
+ type: integer
+ format: int64
+ description: >
+ Timestamp when this event occurred (epoch milliseconds).
+ Timestamps are not guaranteed to be unique. Typically all events in
+ a transaction will have the same timestamp.
+ actor:
+ type: object
+ description: >
+ The actor who performed the operation, such as a user or service
account.
+ Implementations may add arbitrary additional fields; the optional
`id` field is
+ recommended as a portable identifier that consumers can render and
key on.
+ properties:
+ id:
+ type: string
+ description: >
+ Recommended, optional. Stable identifier of the actor (e.g. a
user or
+ service account). Provided when the server can attribute the
operation.
+ additionalProperties: true
+ operation:
+ description: >
+ The operation that was performed, such as creating or updating a
table.
+ The concrete type is selected by the `operation-type`
discriminator defined on
+ `BaseOperation`. Clients should discard events with unknown
operation types.
+ Operations are emitted only when the underlying change is
committed; staged changes
+ are not surfaced.
+ anyOf:
+ - $ref: "#/components/schemas/CreateTableOperation"
+ - $ref: "#/components/schemas/RegisterTableOperation"
+ - $ref: "#/components/schemas/DropTableOperation"
+ - $ref: "#/components/schemas/UpdateTableOperation"
+ - $ref: "#/components/schemas/RenameTableOperation"
+ - $ref: "#/components/schemas/CreateViewOperation"
+ - $ref: "#/components/schemas/DropViewOperation"
+ - $ref: "#/components/schemas/UpdateViewOperation"
+ - $ref: "#/components/schemas/RenameViewOperation"
+ - $ref: "#/components/schemas/CreateNamespaceOperation"
+ - $ref: "#/components/schemas/UpdateNamespacePropertiesOperation"
+ - $ref: "#/components/schemas/DropNamespaceOperation"
+
+ BaseOperation:
type: object
+ description: >
+ Base type for all catalog operations carried by an `Event`. The
`operation-type`
+ discriminator selects the concrete operation schema. Standard
operation types are
+ enumerated in `OperationType`. Implementation-specific operation types
use an `x-` prefix
+ (see `OperationType`) and are carried as a `BaseOperation` with
implementation-defined
+ additional properties; clients should discard operations with unknown
`operation-type`
Review Comment:
On reflection, this is over-specifying parser implementation. Apologies — my
earlier comments pushed in this direction; on closer look, the existing
precedent goes the other way.
The two schemas in the spec where Java parsers actually fall through to an
Unknown variant — `ReportMetricsRequest` and `ViewRepresentation` — have **no**
prose at all telling clients what to do with unknown types. They just list the
known concrete subtypes and stop:
Whether and how to dispatch unknowns is an impl decision, not a wire-level
contract. The "clients should ignore (discard) events with unknown operation
types" + `x-` prefix guidance already lives on `OperationType.description` —
one place is enough.
Suggest trimming to:
```
Base type for all catalog operations carried by an `Event`. The
`operation-type`
discriminator selects the concrete operation schema. Standard operation
types are
enumerated in `OperationType`; implementation-specific operation types use an
`x-` prefix (see `OperationType`).
```
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -4387,338 +4367,393 @@ components:
items:
$ref: '#/components/schemas/TableIdentifier'
- ListFunctionsResponse:
+ ListNamespacesResponse:
type: object
properties:
next-page-token:
$ref: '#/components/schemas/PageToken'
- identifiers:
+ namespaces:
type: array
uniqueItems: true
items:
- $ref: '#/components/schemas/CatalogObjectIdentifier'
+ $ref: '#/components/schemas/Namespace'
- ListNamespacesResponse:
+ UpdateNamespacePropertiesResponse:
type: object
+ required:
+ - updated
+ - removed
properties:
- next-page-token:
- $ref: '#/components/schemas/PageToken'
- namespaces:
+ updated:
+ description: List of property keys that were added or updated
type: array
uniqueItems: true
items:
- $ref: '#/components/schemas/Namespace'
-
- LoadFunctionResult:
- description: |
- Result returned when a function is loaded from the catalog.
-
+ type: string
+ removed:
+ description: List of properties that were removed
+ type: array
+ items:
+ type: string
+ missing:
+ type: array
+ items:
+ type: string
+ description:
+ List of properties requested for removal that were not found
+ in the namespace's properties. Represents a partial success
response.
+ Server's do not need to implement this.
+ nullable: true
- The function metadata JSON is returned in the `metadata` field. The
location of the metadata
- file is returned in the `metadata-location` field, if available.
+ CommitTableResponse:
type: object
required:
+ - metadata-location
- metadata
properties:
metadata-location:
type: string
metadata:
- $ref: '#/components/schemas/FunctionMetadata'
-
- FunctionMetadata:
- description: |
- Portable UDF metadata format.
-
+ $ref: '#/components/schemas/TableMetadata'
- Each function is represented by a self-contained metadata file. The
`format-version` field
- identifies the UDF metadata format.
+ QueryEventsResponse:
type: object
required:
- - function-uuid
- - format-version
- - definitions
- - definition-log
+ - continuation-token
+ - events
properties:
- function-uuid:
- type: string
- format: uuid
- description: A UUID that identifies this UDF, generated once at
creation.
- format-version:
- type: integer
- description: UDF specification format version (must be 1).
- minimum: 1
- maximum: 1
- definitions:
- type: array
- description: List of function definition entities.
- items:
- $ref: '#/components/schemas/FunctionDefinition'
- definition-log:
+ continuation-token:
+ type: string
+ description: >
+ An opaque continuation token to fetch the next page of events.
+ This token encodes the server's cursor position and filter state.
+ Clients should treat this as an opaque value and pass it
unmodified in subsequent requests.
+ When no more events are currently available, the server returns an
empty `events` array
+ and a `continuation-token` that the client can re-issue later to
receive events that
+ occur after this point.
+ events:
type: array
- description: History of versions within the function's definitions.
items:
- $ref: '#/components/schemas/FunctionDefinitionLogEntry'
- location:
- type: string
- description: The function's base location. This is used to store
function metadata files.
- properties:
- type: object
- description: A string-to-string map of properties.
- additionalProperties:
- type: string
- secure:
- type: boolean
- description: Whether it is a secure function.
- default: false
- doc:
- type: string
- description: Documentation string.
+ $ref: "#/components/schemas/Event"
- FunctionDefinition:
+ Event:
type: object
required:
- - definition-id
- - parameters
- - return-type
- - versions
- - current-version-id
- - function-type
+ - event-id
+ - request-id
+ - request-event-count
+ - timestamp-ms
+ - operation
properties:
- definition-id:
+ event-id:
type: string
- description: A canonical string derived from the parameter types,
formatted as a comma-separated list with no spaces.
- parameters:
- type: array
- description: Ordered list of function parameters. Invocation order
must match this list.
- items:
- $ref: '#/components/schemas/FunctionParameter'
- return-type:
- $ref: '#/components/schemas/FunctionDataType'
- return-nullable:
- type: boolean
- description: A hint to indicate whether the return value is nullable
or not.
- default: true
- versions:
- type: array
- description: Versioned implementations of this definition.
- items:
- $ref: '#/components/schemas/FunctionDefinitionVersion'
- current-version-id:
- type: integer
- description: Identifier of the current version for this definition.
- function-type:
- type: string
- description: Function type. When set to "udtf", "return-type" must
be a struct describing the output schema.
- enum: ["udf", "udtf"]
- doc:
+ description: Unique ID of this event. Clients should perform
deduplication based on this ID.
+ request-id:
+ description: >
+ Opaque ID of the request this change belongs to.
+ This ID can be used to identify events that were part of the same
request.
type: string
- description: Documentation string.
-
- FunctionParameter:
+ request-event-count:
+ type: integer
+ description: >
+ Number of events produced by the originating catalog request (e.g.
updateTable
+ or commitTransaction) that generated this event. Such requests can
apply multiple
+ updates atomically, each surfaced as a separate event sharing the
same `request-id`;
+ this count reports how many events that originating request
produced in total.
+ timestamp-ms:
+ type: integer
+ format: int64
+ description: >
+ Timestamp when this event occurred (epoch milliseconds).
+ Timestamps are not guaranteed to be unique. Typically all events in
+ a transaction will have the same timestamp.
+ actor:
+ type: object
+ description: >
+ The actor who performed the operation, such as a user or service
account.
+ Implementations may add arbitrary additional fields; the optional
`id` field is
+ recommended as a portable identifier that consumers can render and
key on.
+ properties:
+ id:
+ type: string
+ description: >
+ Recommended, optional. Stable identifier of the actor (e.g. a
user or
+ service account). Provided when the server can attribute the
operation.
+ additionalProperties: true
+ operation:
+ description: >
+ The operation that was performed, such as creating or updating a
table.
+ The concrete type is selected by the `operation-type`
discriminator defined on
+ `BaseOperation`. Clients should discard events with unknown
operation types.
Review Comment:
Same over-specification concern as on `BaseOperation`. Apologies — my
earlier comments pushed this in.
`ReportMetricsRequest` and `ViewRepresentation` (the two existing schemas
with parser-side graceful fallback) carry no "clients should discard" prose in
their spec descriptions; the contract is conveyed by typing the discriminator
field as `type: string` and stating it on the type definition. The same is true
here — `OperationType.description` already says clients should discard unknown
values; duplicating it on this property is redundant and prescribes parser
behavior the spec doesn't need to dictate.
Suggest trimming to:
```
The operation that was performed, such as creating or updating a table. The
concrete type is selected by the `operation-type` discriminator. Operations
are
emitted only when the underlying change is committed; staged changes are not
surfaced.
```
Keeps the committed-only emission rule (a real wire-level contract) and
drops the parser-implementation prescription.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3708,6 +3778,74 @@ components:
allOf:
- $ref: '#/components/schemas/ScanTasks'
+ QueryEventsRequest:
+ type: object
+ properties:
+ continuation-token:
+ type: string
+ description: >
+ A continuation token returned by a previous response. Clients
should treat the
+ token as an opaque value and pass it unmodified. If not provided,
events are
+ returned from the beginning of the event log subject to other
filters.
+ page-size:
+ type: integer
+ format: int32
+ minimum: 1
+ description: >
+ The maximum number of events to return in a single response.
+ If not provided, the server may choose a default page size.
+ Servers may return less results than requested for various
reasons, such as
+ server side limits, payload size or processing time.
+ since-timestamp-ms:
+ type: integer
+ format: int64
+ description: >
+ Optional starting timestamp (seek-to-timestamp) for the initial
request, in
+ milliseconds (inclusive). Lets clients begin consumption from a
rough point in time
+ without iterating the full history. If not provided, no filtering
based on timestamp
+ values is applied.
+ operation-types:
+ type: array
+ items:
+ $ref: "#/components/schemas/OperationType"
+ description: >
+ Filter events by the type of operation.
+ If not provided, all types are returned.
+ catalog-objects-by-name:
+ type: array
+ items:
+ $ref: "#/components/schemas/CatalogObjectIdentifier"
Review Comment:
The kind-agnostic, prefix-matching design is the right call — composing
`catalog-objects-by-name` with `object-types` is symmetric with the
`catalog-objects-by-uuid` + `object-types` pattern one field below.
The current description carries the load-bearing principles but stacks them
into one dense paragraph: matching rule, kind-independence, subtree-expansion,
leaf-vs-namespace, sibling-exclusion, and (implicitly) cross-kind collision are
interleaved across successive clauses. A reordered version that leads with the
matching rule, then separates leaf-vs-namespace cases, then handles the
collision corner:
```
Each filter entry has the same shape as a `CatalogObjectIdentifier`: an array
of name levels — e.g. `["a","b","t1"]` is the table `a.b.t1`. The filter
matches an event when the entry's levels equal the event object's path, or
form a leading prefix of it. Comparison is level-by-level: each array element
is one path segment, so `["acc"]` is a prefix of `["acc","tax"]` but does
not match `["accounting"]`.
Matching is by name only, independent of object kind. Tables and views are
leaves of the name tree, so a leaf entry matches a single object —
`["a","b","t1"]` matches only the table or view `a.b.t1`. A namespace entry
expands recursively: `["a","b"]` matches the namespace object `a.b` itself
plus every descendant (`a.b.t1`, `a.b.v1`, sub-namespace `a.b.c`, and every
object under it). It does not match the sibling namespace `a.c`.
If a path exists as more than one kind (e.g. a namespace and a same-named
table), the filter matches events for all of them; combine with
`object-types`
to narrow. If not provided, events for all objects must be returned subject
to other filters.
```
Three structural moves vs. the current text:
1. Lead with the matching rule as a self-contained block —
`CatalogObjectIdentifier` shape, exact-or-prefix match, level-by-level, with a
negative example. One rule before any cases.
2. Separate leaf entries from namespace entries explicitly — makes the
implicit subtree expansion visible (a reader filtering by `["accounting"]` to
"watch the namespace" will realize they're subscribing to the entire subtree).
3. Surface the cross-kind name-collision case (which the current description
doesn't mention but the discussion above this comment establishes as
design-intentional) right next to the `object-types` guidance, so the
resolution path is in the same paragraph as the problem.
Same semantic content, ordered to be skimmable on first read.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]