Re: [DISCUSS] Iceberg REST Catalog Idempotency

huaxin gao Thu, 18 Sep 2025 15:46:03 -0700

Thanks Peter and Dmitri for the thoughtful feedback! I really appreciate
you taking a close look at my proposal. I agree that "semantic equality" is
tricky, that's why the scope here is intentionally narrow.


Just to clarify scope: I’m not trying to solve general semantic
equivalence. For these specific, typed request payloads, I serialize to a
deterministic JSON and hash it. That normalizes benign diffs (map order,
whitespace) without trying to infer meaning. The goal is a stable
fingerprint so that if a key is accidentally reused with a changed payload,
we surface that instead of silently diverging.

To make this feel less brittle, I’ll add tests for the practical cases
(ordering/whitespace, nested maps, a clear null‑vs‑missing rule, numeric
formatting), plus end‑to‑end tests in the in‑memory REST fixture with
failure injection (in‑flight dup, finalize failure -> reconcile, etc.).
Happy to walk through these if helpful.

I’m also open to adding a config switch for “trust‑client‑key only” if
that’s preferred in some environments. My intent is to stay aligned with
the IETF Idempotency‑Key guidance (first request wins; conflicting reuse is
rejected, and reusing a key with a different request payload is rejected
via an idempotency fingerprint) while keeping things as simple as possible
and protecting us from accidental key misuse. Would love to align on the
lightest approach that meets those goals.

Thanks,

Huaxin

On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi All,
>
> I agree that checking request contents is almost redundant in this case.
>
> If the randomness quality of Idempotency-Key value is good, collisions are
> very unlikely on the server side. Given that, any content checks the server
> performs are essentially validating that clients correctly reuse the
> generated Idempotency-Key value. (this is mostly the same as my comment on
> the related Polaris discussion).
>
> I'd like to propose making the content check optional so that servers may
> or may not implement it according to their design principles and
> constraints and emphasizing that clients should use unique keys (e.g.
> UUIDs)... basically going with option 2 from Peter's email.
>
> I believe this is in line with the SHOULD word used for this case in the
> IETF draft [1] (section 2.7).
>
> [1]
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>
> Thanks,
> Dmitri.
>
> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry <[email protected]>
> wrote:
>
>> Thanks Huaxin for the proposal, and sorry for the late review - I had a
>> bit of a busy week.
>> I have one main question, which I have also added as a comment to the doc:
>> - Why do we try to compare the request contents when the Idempotency-Key
>> is the same for the requests? The comparison algorithm is a bit
>> complicated, and seems brittle to me. Consistent field ordering, maps, and
>> maybe even inconsistency in upper case/lower case letters might mean
>> technically the same request.
>>
>> In my previous roles (admittedly more than 10 years ago) I was
>> extensively working on APIs like this, and we have never really succeeded
>> in creating a good enough "are these 2 requests are really the same
>> semantically" checks.
>>
>> I would simplify these requirements, unless there are serious arguments
>> for the existence of these checks:
>>
>>    1. Either check for exact matches - without any magic - this could be
>>    used for detecting issues where the duplication happens on the network
>>    side, or
>>    2. Rely entirely on the clients to provide the correct
>>    Idempotency-Key.
>>
>> I would prefer the 2nd.
>> Otherwise I agree with the contents of the proposal. It is nicely done!
>> (edited)
>>
>> Yufei Gu <[email protected]> ezt írta (időpont: 2025. szept. 18., Cs,
>> 2:54):
>>
>>> Thanks for the proposal. It's a nice feature to make retry more reliable
>>> and efficient. Left some comments.
>>>
>>> Yufei
>>>
>>>
>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu <[email protected]> wrote:
>>>
>>>>
>>>> Thanks for writing up the proposal! Makes sense to add idempotency to
>>>> mutation requests.
>>>>
>>>> It would be helpful to add this feature to both the catalog test
>>>> framework and the iceberg-rest-fixture
>>>> <https://github.com/apache/iceberg/blob/754679ddccdf81a97dc65d40f1a2a6fb9f6ee9b0/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java#L112>.
>>>> The latter is used by the subprojects for testing and would come in handy
>>>> when we want to test out the client implementation.
>>>>
>>>> For other reviewers, the Stripe documentation on idempotency was a
>>>> helpful read, https://docs.stripe.com/api/idempotent_requests.
>>>>
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sounds like fairly standard practice and makes sense to me in the
>>>>> first read.
>>>>>
>>>>> Thanks,
>>>>> Szehon
>>>>>
>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I think based on the feedback on the proposal and in recent syncs we
>>>>>> should probably move forward with the actual Spec Change PR so we can see
>>>>>> what this looks like and move on to a discussion of how the Catalog test
>>>>>> framework should test this.
>>>>>>
>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote:
>>>>>> > Hi all,
>>>>>> >
>>>>>> > I’d like to propose a change to Iceberg’s REST API to make mutation
>>>>>> > requests safely retryable.
>>>>>> >
>>>>>> > *The Problem*
>>>>>> > If a POST mutation (e.g., updateTable) succeeds in the catalog but
>>>>>> the
>>>>>> > client doesn’t receive the response (timeout, connection closed,
>>>>>> etc.), a
>>>>>> > second attempt can hit 409 Conflict. The client interprets the 409
>>>>>> as a
>>>>>> > failed commit and deletes the associated metadata files, causing
>>>>>> > catalog/storage inconsistency.
>>>>>> >
>>>>>> > *The Proposed Solution*
>>>>>> > Introduces an optional Idempotency-Key HTTP header on REST mutation
>>>>>> > endpoints and has the Iceberg client pass it through.
>>>>>> >
>>>>>> > *Semantics *(first processed request wins):
>>>>>> >
>>>>>> >    -
>>>>>> >
>>>>>> >    Same key + same canonical payload -> return the original result
>>>>>> (no
>>>>>> >    re-execution).
>>>>>> >    -
>>>>>> >
>>>>>> >    Same key + different payload -> 422 (Unprocessable Content).
>>>>>> >
>>>>>> > *Capability discovery:* catalogs can advertise support and
>>>>>> retention so
>>>>>> > clients know when a retry is safe, e.g.
>>>>>> >
>>>>>> > {
>>>>>> >   "idempotency-tokens-respected": true,
>>>>>> >   "idempotency-token-lifetime": "30m" }
>>>>>> >
>>>>>> > *Scope in Iceberg:* update the OpenAPI to include the header, and
>>>>>> add
>>>>>> > client pass-through + honoring capability discovery. No server
>>>>>> > implementation is mandated—catalogs (e.g., Polaris) can implement
>>>>>> > storage/TTL/replay as they choose.
>>>>>> >
>>>>>> > *Standards alignment:* uses the industry-standard header name and
>>>>>> matches
>>>>>> > the IETF HTTPAPI Idempotency-Key draft
>>>>>> > <
>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
>>>>>> >
>>>>>> > semantics.
>>>>>> >
>>>>>> > *Compatibility:* fully backward compatible. Servers that don’t
>>>>>> support it
>>>>>> > can ignore the header; clients can detect support via capability
>>>>>> discovery.
>>>>>> >
>>>>>> > Here is the proposal
>>>>>> > <
>>>>>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
>>>>>> >.
>>>>>> > Looking forward to your thoughts.
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > Huaxin
>>>>>> >
>>>>>>
>>>>>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

Reply via email to