Thanks for bringing this up, Hongyue. I think the logic here makes sense
and that `commit(base, new)` probably isn't a good API to use for
`registerTable`. But my main objection is that I don't think that it makes
sense to use `TableOperations` for this. Adding a `set` method is awkward
because the table may not already exist.

Why not have catalogs implement `registerTable` directly? For instance,
JDBC could run an INSERT query.

Ryan

On Tue, Jul 8, 2025 at 4:42 PM Steve <hongyue.apa...@gmail.com> wrote:

> Hey Iceberg devs:
>
>   While implementing the overwrite option for registering an external
> table (see PR12228), I realized we might want to evaluate the option to add
> a new method *set(metadata)* on TableOperations interfaces for
> unconditionally set latest table metadata. After some discussions with
> Steven and Russell, I want to seek opinions from the community.
>
> Today, the register-table under the hood use the commit API from
> TableOperations, but it might not work well with overwrite registration due
> to given assumptions
>
>    1.
>
>    commit (base, new) is designed to avoid overwriting updates and
>    mandate the provided base metadata is the same as the current metadata of
>    the table. However for overwrite registration, the end goal is to reset
>    table metadata to a desired state, so we do not want to retry on failure
>    even if the base table state changes.
>    2.
>
>    commit(base, new) will always write new metadata.json files in case of
>    successful commit. For a successful registration. The file that is
>    being passed is the file that should be used for registration.
>    Previous workarounds (e.g., PR6591) reused user-provided metadata only for
>    new tables, but cannot generalize to the overwrite case as it cannot
>    differentiate normal update and overwrite registration at the
>    TableOperations layer.
>    3.
>
>    When committing fails, the system attempts to delete the new metadata
>    file, presuming it was authored by the committer. This is not appropriate
>    for registration scenarios where the metadata file might have been
>    generated elsewhere (see PR13169)
>
>
> One alternative—dropping and recreating the table—raises concerns about
> atomicity, since it can leave the table in an invalid intermediate state if
> a failure occurs between drop and creation.
>
> Given this, I would like to share proposed new API to be add:
>
> *  /***
>
> *   * *Atomically set the provided table metadata as current, bypassing
> base state checks.
>
> *   * @param metadata *the new table metadata to make current
>
> *   */*
>
> *  void set(TableMetadata metadata);*
>
>
> PR12228 Add overwrite option when register external table to catalog:
> https://github.com/apache/iceberg/pull/12228
>
> PR6591 Avoid creating new metadata file when registerTable API is used:
> https://github.com/apache/iceberg/pull/6591
> PR13169 Only remove metadata files if TableOp created a new one:
> github.com/apache/iceberg/pull/13169
>
> Thanks,
>
> Hongyue Zhang
>
>
>

Reply via email to