[I] REST Catalog: expose a staged create-or-replace transaction primitive for non-Java clients [iceberg]

via GitHub Tue, 09 Jun 2026 02:34:47 -0700


malon64 opened a new issue, #16741:
URL: https://github.com/apache/iceberg/issues/16741


   ### Feature Request / Improvement
   
   ### Feature Request / Improvement
   
   Iceberg Java exposes a high-level API for create-or-replace table semantics 
through:
   
   ```java
   catalog.buildTable(identifier, schema)
       .createOrReplaceTransaction();
   ```
   
   This is very useful because the client does not need to implement the full 
replacement logic manually. The Java implementation can decide whether the 
operation is a create or a replace, build the correct table metadata, assign 
field IDs correctly, prepare the replacement transaction, and commit it using 
Iceberg’s transaction model.
   
   However, for non-Java clients that interact with Iceberg only through the 
REST Catalog API, there does not seem to be an equivalent high-level primitive.
   
   The REST API exposes the lower-level commit mechanism through 
`CommitTableRequest` / `UpdateTableRequest`, but it does not expose a staged 
create-or-replace transaction workflow equivalent to Java’s 
`createOrReplaceTransaction()`.
   
   This creates a difficult situation for clients such as C++ engines:
   
   * Implementing `CREATE OR REPLACE TABLE` as `DROP TABLE` followed by `CREATE 
TABLE` is not atomic.
   * If the delete succeeds and the create fails, the table can disappear.
   * If another writer acts between the two calls, the catalog state can become 
inconsistent from the user’s point of view.
   * Reimplementing Java’s replacement planning logic outside the Java library 
is complex and easy to get wrong.
   * A naïve REST metadata update is also not necessarily equivalent to a true 
Iceberg replace transaction.
   
   ### Motivation
   
   I am looking at this from the perspective of a non-Java REST client, 
specifically a C++ engine integration.
   
   For Java engines, this is mostly hidden behind the Iceberg Java API. For 
example, Trino can rely on the Iceberg Java catalog/transaction APIs, even when 
using a REST catalog.
   
   For C++ clients, there is no equivalent API available through the REST 
specification. The client either has to:
   
   1. fall back to unsafe `DROP` + `CREATE` behavior, or
   2. manually reconstruct the replacement metadata, requirements, 
snapshot/reference changes, field IDs, partition spec handling, and concurrency 
checks.
   
   Neither option is ideal.
   
   The missing piece is not necessarily the final commit endpoint. The existing 
`UpdateTableRequest` / `CommitTableRequest` mechanism may still be the right 
final commit primitive. What is missing is a standardized staged planning 
operation that gives non-Java clients the same safe replacement semantics that 
Java clients get from `createOrReplaceTransaction()`.
   
   ### Why a simple final `create-or-replace` endpoint may not be enough
   
   In the Java REST implementation, `createOrReplaceTransaction()` cannot 
simply be decided at the final commit step, because create and replace can 
assign different field IDs, and those IDs may be used in data and metadata 
files before the transaction is committed.
   
   So this probably should not be just:
   
   ```http
   POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/create-or-replace
   ```
   
   as a final commit call.
   
   A more useful design may be a staged operation, for example a REST-level 
equivalent of:
   
   ```java
   createOrReplaceTransaction()
   ```
   
   that lets the client know, before writing data files, whether the operation 
is being planned as a create or a replace and what metadata/IDs should be used.
   
   ### Possible direction
   
   Would it make sense for the REST Catalog spec to expose a staged 
create-or-replace / staged replace transaction workflow?
   
   For example, something conceptually similar to:
   
   ```http
   POST 
/v1/{prefix}/namespaces/{namespace}/tables/{table}/stage-create-or-replace
   ```
   
   or an extension of the existing staged create flow with a create-or-replace 
mode.
   
   The response could provide enough information for non-Java clients to 
continue safely, such as:
   
   * whether the operation is a create or a replace
   * the planned table metadata
   * assigned schema/field IDs
   * partition spec and sort order IDs
   * table location
   * required optimistic concurrency requirements
   * metadata updates needed for the final commit
   * credentials/config needed to write data files
   
   Then the final commit could still use the existing REST commit mechanism.
   
   ### Expected behavior
   
   A REST-level staged create-or-replace primitive should allow a non-Java 
client to implement:
   
   ```sql
   CREATE OR REPLACE TABLE t AS SELECT ...
   ```
   
   without using `DROP TABLE` + `CREATE TABLE`.
   
   For an existing table, the expected behavior would be:
   
   * preserve transactionality at the catalog metadata level
   * avoid any window where the table disappears
   * build replacement metadata correctly
   * fail on concurrent conflicting changes instead of silently overwriting them
   * allow the new snapshot to become the current table state
   * optionally keep previous snapshots/history according to Iceberg semantics
   
   For a missing table, it should behave like a staged create.
   
   ### Relationship to existing issues
   
   I saw issue #16232, which discusses correctness problems around `REPLACE 
TABLE` transactions and concurrent committed changes.
   
   This proposal is related, but not the same. #16232 is about making replace 
transactions safe/correct. This issue is about exposing a high-level REST 
primitive so non-Java clients can access equivalent create-or-replace 
transaction semantics without reimplementing the Java logic or falling back to 
non-atomic drop/create behavior.
   
   ### Question
   
   Would the Iceberg community be open to adding a REST Catalog staged 
create-or-replace / staged replace transaction API?
   
   If yes, what would be the preferred design direction?
   
   * Extend staged create?
   * Add a staged replace endpoint?
   * Add a staged create-or-replace endpoint?
   * Or is the expectation that every non-Java REST client should reconstruct 
the replacement transaction locally using `CommitTableRequest` requirements and 
updates?
   
   
   ### Query engine
   
   Other
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [x] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] REST Catalog: expose a staged create-or-replace transaction primitive for non-Java clients [iceberg]

Reply via email to