Re: [DISCUSS] Iceberg REST Catalog Idempotency

Dmitri Bourlatchkov Wed, 29 Oct 2025 12:50:15 -0700

Hi All,

>From my POV (and I may be repeating what I put in GH comments), the main
point in using UUID v7 is specifying that a timestamp should be part of the
idempotency key. As previously discussed, having this timestamp is
beneficial to server implementations.


The IETF Idempotency Key draft v7 [1] allows servers to require specific ID
generation algorithms.

We could have a custom ID format, but UUID v7 is already defined and fits
this use case.

If for some reason UUID v7 becomes "weak" in the future, such an event will
have a much greater impact than the REST Catalog API. In any case, if that
happens, nothing prevents revisioning the REST API spec to allow for
stronger ID generators.

[1]
https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client

Cheers,
Dmitri.

On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu <[email protected]> wrote:

> +1 on option 2: don’t mandate a specific key format.
>
> Concerns with option 1 (UUIDv7-mandatory):
> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re stuck
> with a brittle contract.
> 2. Unnecessary constraints. It binds both client and server
> implementations. One of IRC’s goals is to simplify client work; forcing
> UUIDv7 limits client choices for marginal gain (the embedded timestamp).
>
> Here are existing implementations for reference:
>
>    - Stripe[1]: recommends UUIDv4 but does not enforce a format for
>    idempotency keys.
>    - AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII
>    characters for the client token.
>
> I'd propose to treat the idempotency key as an opaque string with basic
> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are
> fine”) but avoid making the format mandatory. This keeps the API
> future-proof and client-friendly while preserving server-side flexibility.
>
> 1. https://docs.stripe.com/api/expanding_objects
> 2.
> https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
>
> Yufei
>
>
> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao <[email protected]> wrote:
>
>> Hi Yun,
>> Thanks for the thoughtful feedback!
>>
>> Yes, the key itself is expected to be globally unique. You’re also right
>> that we don’t need to mandate UUIDs to achieve that; other schemes can
>> provide global uniqueness.
>>
>> I have chosen UUID because several folks in the community prefer it as a
>> common, interoperable choice. That said, I agree that mandating UUIDv7 adds
>> constraints on clients without clear spec-level benefit.
>>
>> I also agree we should separate spec from implementation; details like
>> the key generation method can live in implementation guidance.
>>
>> From your note, it sounds like you support Option 2
>> (version-agnostic)—i.e., require a “globally unique idempotency key” and
>> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while
>> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this
>> as a +1 for Option 2.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Fri, Oct 24, 2025 at 7:00 PM yun zou <[email protected]>
>> wrote:
>>
>>> Sorry, I accidentally sent the email before complete, please ignore my
>>> previous email. Sorry for the noise and inconvenience.
>>>
>>> Hi Huaxin,
>>>
>>> This is a really interesting and valuable proposal — it provides a
>>> great way to address the issue of duplicate client requests. Thank you
>>> for proposing and driving this forward!
>>>
>>> One point that isn’t entirely clear to me is how the server uniquely
>>> identifies each request.  Are we relying solely on the idempotency-key
>>> being globally unique, or is there an additional identifier such as
>>> clientId + idempotency-key? Based on the current discussion, it sounds
>>> like the proposal expects the key itself to be globally unique, likely
>>> through the use of a UUID, but I’d like to double-check my
>>> understanding.
>>>
>>> If we are indeed relying on the client to generate a globally unique
>>> ID, that approach makes sense. However, it doesn’t seem necessary to
>>> mandate the use of UUIDs, as there are other valid methods for
>>> achieving global uniqueness. Imposing a further restriction to UUIDv7
>>> would place additional constraints on the client implementation.
>>>
>>> From a specification perspective, I think it would be better to
>>> separate the spec from the implementation. In other words, we should
>>> make it clear that the key must be globally unique, but we don’t need
>>> to specify that it must be a UUID or UUIDv7.
>>>
>>> Best Regards,
>>> Yun
>>>
>>> On Fri, Oct 24, 2025 at 4:41 PM huaxin gao <[email protected]>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Thank you for taking the time to review my proposal and PR—I really
>>> appreciate the input.
>>> >
>>> > There’s one remaining issue I’d like to settle. In the Iceberg Catalog
>>> Community sync, many preferred mandating UUIDv7 for the idempotency key. At
>>> the same time, there are some concerns:
>>> >
>>> > If we need a timestamp, it should be a separate field; we shouldn’t
>>> use the UUIDv7 timestamp.
>>> >
>>> > If we use the UUID timestamp for expiry, we’d have to require keys to
>>> be generated at request time, which feels over-engineered.
>>> >
>>> > If we want to use the UUIDv7 timestamp, it should be for debugging
>>> only.
>>> >
>>> > Based on that, here’s a draft update to the spec:
>>> >
>>> > Key Requirements:
>>> > - Key format: UUIDv7 in string format as defined in RFC 9562.
>>> >   See
>>> https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value
>>> .
>>> > - The idempotency key must be globally unique (no reuse across
>>> different operations).
>>> > - Catalogs SHOULD NOT expire keys before the end of the advertised
>>> token lifetime.
>>> > - If Idempotency-Key is used, clients MUST reuse the same key when
>>> retrying the same
>>> >   logical operation and MUST generate a new key for a different
>>> operation.
>>> > - Server behavior: Servers MUST validate the syntactic validity of
>>> UUIDv7 (per RFC 9562).
>>> >   Servers MUST NOT make behavioral decisions based on the UUID’s
>>> internal timestamp fields.
>>> >   The idempotency key is an opaque, unique identifier used only for
>>> lookup/deduplication.
>>> >
>>> > This reads a bit awkward to me: we mandate UUIDv7 but prohibit using
>>> its timestamp, which seems to undercut the reason to require v7 in the
>>> first place.
>>> >
>>> > I’d appreciate feedback on whether we should:
>>> >
>>> > Option 1 — Require v7.
>>> > Keep UUIDv7 required, with the server restrictions above (syntactic v7
>>> validation only; no behavioral decisions based on the embedded timestamp).
>>> >
>>> > Option 2 — Version-agnostic.
>>> > Make the client spec version-agnostic (require RFC 9562 UUID textual
>>> form; allow v7 as a recommendation). Leave any timestamp/lifetime mechanics
>>> to a server-side (Polaris idempotency) document.
>>> >
>>> > Thanks again for the thoughtful discussion.
>>> >
>>> > Best,
>>> >
>>> > Huaxin
>>> >
>>> >
>>> > On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov <[email protected]>
>>> wrote:
>>> >>
>>> >> Hi Huaxin,
>>> >>
>>> >> Sorry about the delay. I posted some comments on
>>> https://github.com/apache/iceberg/pull/14196 Some of them I might have
>>> mentioned on the doc too, so apologies if they got answered in the doc and
>>> I missed it.
>>> >>
>>> >> Cheers,
>>> >> Dmitri.
>>> >>
>>> >> On Thu, Sep 25, 2025 at 12:27 PM huaxin gao <[email protected]>
>>> wrote:
>>> >>>
>>> >>> Thank you all for taking the time to review and discuss! I’ve
>>> responded to all questions and updated the proposal. If there are no
>>> additional concerns, I’ll proceed to start a VOTE thread.
>>> >>>
>>> >>> Thanks,
>>> >>> Huaxin
>>> >>>
>>> >>> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar <
>>> [email protected]> wrote:
>>> >>>>
>>> >>>> +1, for low level retry which ensures that the idempotent key is
>>> never committed twice. I also agree that canonicalizing the request body
>>> where the client can change it due to conflict resolution and retry would
>>> be hard to get right.
>>> >>>>
>>> >>>> On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>> +1 to this being mostly targeting a "low-level" retry semantic.
>>> Expanding on that though I'd say even "client-side retries" really have two
>>> distinct flavors:
>>> >>>>>
>>> >>>>> A. Business-logic-agnostic retries, e.g. in a common low-level
>>> HTTP client library - behaviorally, these should behave largely the same as
>>> "network infra retries". The key distinction is that in this case any
>>> content hashing would be *post* serialization and even agnostic to
>>> request-body content-type (i.e. not JSON-specific).
>>> >>>>> B. Application-specific retries, such as when Iceberg client will
>>> potentially rebase on a new snapshot
>>> >>>>>
>>> >>>>> I think this aligns with what Peter and others mentioned earlier
>>> where trying to canonicalize the *semantic* content of a request is
>>> probably brittle/risky. And as Yufei mentions, case 2.B (client-side real
>>> application-layer retries) should be using a new idempotency-key if it's
>>> ever doing the retry at the later that requires re-serializating JSON.
>>> >>>>>
>>> >>>>> Overall though I agree making the content-hash checking optional
>>> is a good idea.
>>> >>>>>
>>> >>>>> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao <[email protected]>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Thanks, Peter and Yufei. I agree the main use case is
>>> network‑infrastructure retries. To keep the specification simple and move
>>> the proposal forward, let’s make the baseline key‑only idempotency. If
>>> there’s demand, we can add an optional payload‑binding mode (canonical JSON
>>> + SHA‑256), advertised via /v1/config.
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>>
>>> >>>>>> Huaxin
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu <[email protected]>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> "Network infrastructure retries" would be the dominant use case.
>>> I'd NOT recommend clients retry with the same idempotency key if it
>>> regenerated the request, instead, clients should reload before retry in
>>> that case.
>>> >>>>>>>
>>> >>>>>>> Yufei
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry <
>>> [email protected]> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Hi Huaxin,
>>> >>>>>>>>
>>> >>>>>>>> Could you clarify the specific use cases we intend to support
>>> regarding retry checking? Here are a couple of possibilities I had in mind:
>>> >>>>>>>>
>>> >>>>>>>> Network infrastructure retries – where the exact same request
>>> is retried.
>>> >>>>>>>> Client-side retries – where the client regenerates the request
>>> using the same program logic, resulting in identical content.
>>> >>>>>>>>
>>> >>>>>>>> If there are no security or other concerns, I’d suggest keeping
>>> the specification simple and avoiding mechanisms that surface client-side
>>> implementation errors. The cleanest approach might be to ignore the request
>>> content and rely solely on a user-provided key.
>>> >>>>>>>>
>>> >>>>>>>> Alternatively, we could include an optional error code in the
>>> response, which implementations may use to signal conflicts. The actual
>>> conflict detection logic can be left to the implementations—we don’t need
>>> to define it in the specification. If we go this route, we should also
>>> offer a way to disable these checks, since there will inevitably be cases
>>> where semantically identical requests are incorrectly flagged as
>>> conflicting.
>>> >>>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>> Peter
>>> >>>>>>>>
>>> >>>>>>>> huaxin gao <[email protected]> ezt írta (időpont: 2025.
>>> szept. 19., P, 1:38):
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks Steven for the +1 and for raising the fingerprint
>>> question! Great points!
>>> >>>>>>>>>
>>> >>>>>>>>> What we need to protect against:
>>> >>>>>>>>>
>>> >>>>>>>>> Same logical request, different bytes across retries (pretty
>>> vs compact JSON, map key order, ...).
>>> >>>>>>>>> Accidental key reuse with a changed payload.
>>> >>>>>>>>>
>>> >>>>>>>>> Options and tradeoffs:
>>> >>>>>>>>>
>>> >>>>>>>>> Exact byte checksum (e.g., SHA‑256 over raw body)
>>> >>>>>>>>>
>>> >>>>>>>>> Pro: trivial, fast
>>> >>>>>>>>> Con: too strict; benign diffs cause false mismatches
>>> >>>>>>>>>
>>> >>>>>>>>> Canonical JSON over full request, then hash (proposed)
>>> >>>>>>>>>
>>> >>>>>>>>> Pro: stable across whitespace/key order; simple to implement
>>> for typed payloads
>>> >>>>>>>>> Con: slightly more work than raw checksum;
>>> >>>>>>>>>
>>> >>>>>>>>> Checksum of selected fields / field-by-field match
>>> >>>>>>>>>
>>> >>>>>>>>> Pro: can be faster for huge payloads; can ignore noisy fields
>>> >>>>>>>>> Con: could misses legitimate differences
>>> >>>>>>>>>
>>> >>>>>>>>> Request digest/signature
>>> >>>>>>>>>
>>> >>>>>>>>> Pro: very strong
>>> >>>>>>>>> Con: heavyweight
>>> >>>>>>>>>
>>> >>>>>>>>> Maybe we could make this configurable:
>>> >>>>>>>>>
>>> >>>>>>>>> canonical-json-sha256 (default)
>>> >>>>>>>>> raw-bytes-sha256 (strict)
>>> >>>>>>>>> trust-client-key (no fingerprint check)
>>> >>>>>>>>>
>>> >>>>>>>>> On the IETF draft status:
>>> >>>>>>>>>
>>> >>>>>>>>> I have also noted the draft’s expiry. We will align with its
>>> semantics for now and can adjust if a new version lands.
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks,
>>> >>>>>>>>>
>>> >>>>>>>>> Huaxin
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu <
>>> [email protected]> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> +1 for the feature that can make retry safe for 500s and
>>> improve the client fault-tolerance of transient server failures.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Peter and Dimitri raised a good question on the fingerprint.
>>> The IETF draft doesn't actually define the fingerprint algo. We can also go
>>> with simple checksum of the entire request payload, which would be cheap to
>>> compute. Do we anticipate any anticipated scenarios where clients may
>>> rewrite the payload in different forms of serialized bytes during retries?
>>> >>>>>>>>>>
>>> >>>>>>>>>>    *  Checksum of the entire request payload.
>>> >>>>>>>>>>    *  Checksum of selected element(s) in the request payload.
>>> >>>>>>>>>>    *  Field value match for each field in the request payload.
>>> >>>>>>>>>>    *  Field value match for selected element(s) in the
>>> request payload.
>>> >>>>>>>>>>    *  Request digest/signature
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> BTW, the IETF draft seems to have expired without approval
>>> >>>>>>>>>>
>>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao <
>>> [email protected]> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks Peter and Dmitri for the thoughtful feedback! I
>>> really appreciate you taking a close look at my proposal. I agree that
>>> "semantic equality" is tricky, that's why the scope here is intentionally
>>> narrow.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Just to clarify scope: I’m not trying to solve general
>>> semantic equivalence. For these specific, typed request payloads, I
>>> serialize to a deterministic JSON and hash it. That normalizes benign diffs
>>> (map order, whitespace) without trying to infer meaning. The goal is a
>>> stable fingerprint so that if a key is accidentally reused with a changed
>>> payload, we surface that instead of silently diverging.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> To make this feel less brittle, I’ll add tests for the
>>> practical cases (ordering/whitespace, nested maps, a clear null‑vs‑missing
>>> rule, numeric formatting), plus end‑to‑end tests in the in‑memory REST
>>> fixture with failure injection (in‑flight dup, finalize failure ->
>>> reconcile, etc.). Happy to walk through these if helpful.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I’m also open to adding a config switch for
>>> “trust‑client‑key only” if that’s preferred in some environments. My intent
>>> is to stay aligned with the IETF Idempotency‑Key guidance (first request
>>> wins; conflicting reuse is rejected, and reusing a key with a different
>>> request payload is rejected via an idempotency fingerprint) while keeping
>>> things as simple as possible and protecting us from accidental key misuse.
>>> Would love to align on the lightest approach that meets those goals.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks,
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Huaxin
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov <
>>> [email protected]> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Hi All,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I agree that checking request contents is almost redundant
>>> in this case.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> If the randomness quality of Idempotency-Key value is good,
>>> collisions are very unlikely on the server side. Given that, any content
>>> checks the server performs are essentially validating that clients
>>> correctly reuse the generated Idempotency-Key value. (this is mostly the
>>> same as my comment on the related Polaris discussion).
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I'd like to propose making the content check optional so
>>> that servers may or may not implement it according to their design
>>> principles and constraints and emphasizing that clients should use unique
>>> keys (e.g. UUIDs)... basically going with option 2 from Peter's email.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I believe this is in line with the SHOULD word used for
>>> this case in the IETF draft [1] (section 2.7).
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> [1]
>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Thanks,
>>> >>>>>>>>>>>> Dmitri.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry <
>>> [email protected]> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Thanks Huaxin for the proposal, and sorry for the late
>>> review - I had a bit of a busy week.
>>> >>>>>>>>>>>>> I have one main question, which I have also added as a
>>> comment to the doc:
>>> >>>>>>>>>>>>> - Why do we try to compare the request contents when the
>>> Idempotency-Key is the same for the requests? The comparison algorithm is a
>>> bit complicated, and seems brittle to me. Consistent field ordering, maps,
>>> and maybe even inconsistency in upper case/lower case letters might mean
>>> technically the same request.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> In my previous roles (admittedly more than 10 years ago) I
>>> was extensively working on APIs like this, and we have never really
>>> succeeded in creating a good enough "are these 2 requests are really the
>>> same semantically" checks.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> I would simplify these requirements, unless there are
>>> serious arguments for the existence of these checks:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Either check for exact matches - without any magic - this
>>> could be used for detecting issues where the duplication happens on the
>>> network side, or
>>> >>>>>>>>>>>>> Rely entirely on the clients to provide the correct
>>> Idempotency-Key.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> I would prefer the 2nd.
>>> >>>>>>>>>>>>> Otherwise I agree with the contents of the proposal. It is
>>> nicely done! (edited)
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Yufei Gu <[email protected]> ezt írta (időpont: 2025.
>>> szept. 18., Cs, 2:54):
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Thanks for the proposal. It's a nice feature to make
>>> retry more reliable and efficient. Left some comments.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Yufei
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu <
>>> [email protected]> wrote:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Thanks for writing up the proposal! Makes sense to add
>>> idempotency to mutation requests.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> It would be helpful to add this feature to both the
>>> catalog test framework and the iceberg-rest-fixture. The latter is used by
>>> the subprojects for testing and would come in handy when we want to test
>>> out the client implementation.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> For other reviewers, the Stripe documentation on
>>> idempotency was a helpful read,
>>> https://docs.stripe.com/api/idempotent_requests.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>> Kevin Liu
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho <
>>> [email protected]> wrote:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Hi,
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Sounds like fairly standard practice and makes sense to
>>> me in the first read.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Thanks,
>>> >>>>>>>>>>>>>>>> Szehon
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
>>> [email protected]> wrote:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I think based on the feedback on the proposal and in
>>> recent syncs we should probably move forward with the actual Spec Change PR
>>> so we can see what this looks like and move on to a discussion of how the
>>> Catalog test framework should test this.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote:
>>> >>>>>>>>>>>>>>>>> > Hi all,
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > I’d like to propose a change to Iceberg’s REST API
>>> to make mutation
>>> >>>>>>>>>>>>>>>>> > requests safely retryable.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *The Problem*
>>> >>>>>>>>>>>>>>>>> > If a POST mutation (e.g., updateTable) succeeds in
>>> the catalog but the
>>> >>>>>>>>>>>>>>>>> > client doesn’t receive the response (timeout,
>>> connection closed, etc.), a
>>> >>>>>>>>>>>>>>>>> > second attempt can hit 409 Conflict. The client
>>> interprets the 409 as a
>>> >>>>>>>>>>>>>>>>> > failed commit and deletes the associated metadata
>>> files, causing
>>> >>>>>>>>>>>>>>>>> > catalog/storage inconsistency.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *The Proposed Solution*
>>> >>>>>>>>>>>>>>>>> > Introduces an optional Idempotency-Key HTTP header
>>> on REST mutation
>>> >>>>>>>>>>>>>>>>> > endpoints and has the Iceberg client pass it through.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *Semantics *(first processed request wins):
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> >    -
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> >    Same key + same canonical payload -> return the
>>> original result (no
>>> >>>>>>>>>>>>>>>>> >    re-execution).
>>> >>>>>>>>>>>>>>>>> >    -
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> >    Same key + different payload -> 422
>>> (Unprocessable Content).
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *Capability discovery:* catalogs can advertise
>>> support and retention so
>>> >>>>>>>>>>>>>>>>> > clients know when a retry is safe, e.g.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > {
>>> >>>>>>>>>>>>>>>>> >   "idempotency-tokens-respected": true,
>>> >>>>>>>>>>>>>>>>> >   "idempotency-token-lifetime": "30m" }
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *Scope in Iceberg:* update the OpenAPI to include
>>> the header, and add
>>> >>>>>>>>>>>>>>>>> > client pass-through + honoring capability discovery.
>>> No server
>>> >>>>>>>>>>>>>>>>> > implementation is mandated—catalogs (e.g., Polaris)
>>> can implement
>>> >>>>>>>>>>>>>>>>> > storage/TTL/replay as they choose.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *Standards alignment:* uses the industry-standard
>>> header name and matches
>>> >>>>>>>>>>>>>>>>> > the IETF HTTPAPI Idempotency-Key draft
>>> >>>>>>>>>>>>>>>>> > <
>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
>>> >
>>> >>>>>>>>>>>>>>>>> > semantics.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > *Compatibility:* fully backward compatible. Servers
>>> that don’t support it
>>> >>>>>>>>>>>>>>>>> > can ignore the header; clients can detect support
>>> via capability discovery.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > Here is the proposal
>>> >>>>>>>>>>>>>>>>> > <
>>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
>>> >.
>>> >>>>>>>>>>>>>>>>> > Looking forward to your thoughts.
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > Thanks,
>>> >>>>>>>>>>>>>>>>> >
>>> >>>>>>>>>>>>>>>>> > Huaxin
>>> >>>>>>>>>>>>>>>>> >
>>>
>>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

Reply via email to