Hi Huaxin,

Thanks for mentioning the related IETF draft. Based on the links from your
Iceberg doc, I was able to find only revision 6 of this draft [2].

This revision is already expired, which is a bit of a surprise to me :) Do
you know of a newer revision?

In any case, I believe it would be preferable to use the IETF draft as the
basis from the specification of client/server interactions in the REST API
and only put in the Iceberg/Polaris-specific docs the information where we
extend the IEFT proposal.

That said, I guess this discussion thread becomes a discussion of possible
ways to implement idempotency keys in Polaris.

I proposed an implementation based on "reliable Polaris tasks" [1].

If you have an alternative proposal in mind, it might be best to put it in
your Polaris document as a separate section (after the "spec" parts) for
further discussion.

I'd appreciate it if you added my tasks-based impl. proposal as alternative
impl. section in that doc too, if you do not mind :)

[1] https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l
[2]
https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06

Cheers,
Dmitri.

On Tue, Sep 16, 2025 at 4:43 PM Huaxin Gao <[email protected]> wrote:

> Hi all, thanks for the thoughtful feedback (JB, Dmitri, Yufei). Sorry for
> the late reply. I missed the emails because I wasn’t subscribed to the dev
> list.
>
> The companion Iceberg REST Catalog proposal is here (discussion + spec
> text):
>
> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0#heading=h.jfecktgonj1i
>
> The discussion thread is here:
> https://lists.apache.org/[email protected]
>
> The intent of the Polaris proposal is to implement those Iceberg
> client-visible semantics.
>
> I see the conceptual link to tasks/operations. For this proposal, I’d like
> to keep the scope small and client-visible. Concretely:
>
> Same key + same payload -> return the original result (no re-run)
>
> Same key + different payload -> 422
>
> If the first request is still running -> don’t start another (409 or
> wait-then-replay)
>
> Only replay finalized results (2xx or terminal 4xx)
>
> Advertise support via GET /v1/config (idempotency-key-supported,
> idempotency-key-lifetime)
>
> This aligns with the IETF Idempotency-Key draft. It’s intentionally
> lightweight, so I’d prefer to keep the scope simple for now rather than tie
> it to a tasks framework.
>
> Thanks,
> Huaxin
>
> On 2025/09/16 19:12:31 Yufei Gu wrote:
> > Hi Dimitri,
> >
> > Thanks for sharing your feedback. I see the analogy you’re making with
> > tasks/operations, but I don’t think it holds in this case.
> >
> > Idempotency and tasks solve very different problems:
> >
> >    - Idempotency is about safely retrying a synchronous request without
> >    causing side effects or inconsistencies. See more details in the IETF
> >    doc[1].
> >    - Tasks are about orchestrating and tracking asynchronous execution.
> >
> > Merging the two conceptually risks overcomplicating the problem. The
> client
> > doesn’t need or want to manage a long-lived task here, only to ensure
> that
> > a retry of the same payload doesn’t break consistency.
> >
> > I don’t think we should couple the idempotency proposal to tasks. Doing
> so
> > creates unnecessary dependencies and delays for a feature that is
> valuable
> > in its own right. Tasks may be useful for other scenarios, but
> idempotency
> > should remain a lightweight mechanism, independent of whether the server
> > executes work synchronously or asynchronously.
> >
> > So while there might be some surface-level similarities, I believe
> treating
> > idempotency as a standalone concern is the more pragmatic path forward.
> >
> > [1]
> >
> https://www.ietf.org/archive/id/draft-ietf-httpapi-idempotency-key-header-06.txt
> >
> > Yufei
> >
> >
> > On Tue, Sep 16, 2025 at 11:08 AM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> >
> > > Hi Yufei,
> > >
> > > As I understand, the proposal is to allow re-submitted requests to
> succeed
> > > when the previous request with the same key (ID) completed on the
> server
> > > side while the client was in the process of re-trying.
> > >
> > > I agree that this is a valuable feature.
> > >
> > > However, my point about connecting it to tasks / operations is
> completely
> > > at the conceptual level.
> > >
> > > If one considers the server as a black box, the client effectively
> submits
> > > a request with some key and payload. Then the client submits another
> > > request with the same key and payload and expects the second request to
> > > return the result of the first request (success or failure). This is
> > > conceptually equivalent to the server executing requests as
> asynchronous
> > > tasks (the key being the task's ID).
> > >
> > > Therefore, before we consider any implementation for the "idempotency"
> > > feature, I believe it would be wise to consider possible synergies
> with the
> > > tasks proposal [1].
> > >
> > > I personally think these synergies are in fact quite large and
> powerful,
> > > especially considering Polaris Servers in a distributed environment.
> > >
> > > To put it another way, I believe implementing the "idempotency"
> proposal on
> > > top of the "tasks" proposal is going to be quite easy (if not trivial).
> > >
> > > [1] https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Sat, Sep 13, 2025 at 11:44 AM Yufei Gu <[email protected]>
> wrote:
> > >
> > > > Hi Dimitri,
> > > >
> > > > The idempotency key is mainly to reduce the chance of
> inconsistencies of
> > > > table commit. Here is a typical scenario: A commit succeeds but the
> > > client
> > > > misses the response (due to timeout, restart, etc.), client retries
> and
> > > > hits 409 Conflict, then mistakenly treats this as a failed commit and
> > > > cleans up related files, leading to inconsistency. Without an
> idempotency
> > > > key, the best a client could do is to treat most of the non-200
> response
> > > as
> > > > unknown status, and reload the table to verify, which is also
> > > inefficient.
> > > >
> > > > It could enable more use cases, as you mentioned, getting status of
> an
> > > > async task, but I would not complicate it at this point. That
> probably
> > > > deserves a separate proposal, since task status goes beyond simple
> HTTP
> > > > codes and would require clients to interpret them correctly. For now,
> > > > http_status is sufficient, as it’s already well-supported by all
> existing
> > > > clients.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Fri, Sep 12, 2025 at 7:58 PM Dmitri Bourlatchkov <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Huaxin.
> > > > >
> > > > > Thanks again for starting this proposal.
> > > > >
> > > > > After a quick initial read of the docs, this feels like a proposal
> to
> > > > treat
> > > > > each REST change request as an identifiable "task" or "operation"
> where
> > > > the
> > > > > ID is allocated by the client and the execution status is tracked
> by
> > > the
> > > > > server.
> > > > >
> > > > > Essentially, with the proposed feature implemented, it would be
> > > possible
> > > > to
> > > > > provide a "get status" API for each "idempotency key".
> > > > >
> > > > > Does that make sense?
> > > > >
> > > > > Do you think the proposal could be reformulated in terms of
> > > > > "tasks/operations"? From my POV it might be clearer and easier to
> > > > > understand.
> > > > >
> > > > > In that regard, as far as Polaris is concerned, I think it
> connected
> > > well
> > > > > and could build on top of the tasks proposal [1].
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l
> > > > >
> > > > > Thanks,
> > > > > Dmitri.
> > > > >
> > > > >
> > > > > On Thu, Sep 11, 2025 at 2:37 PM huaxin gao <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I’d like to propose adding *optional idempotent retries* to
> Polaris
> > > > REST
> > > > > > mutation endpoints via an Idempotency-Key header.
> > > > > >
> > > > > > *The Problem*
> > > > > > When a mutation (e.g., POST /v1/tables/{name}/commit) succeeds
> but
> > > the
> > > > > > client doesn’t receive the response (timeout, restart, etc.), a
> retry
> > > > can
> > > > > > hit 409 Conflict. Some clients then treat this as a failed
> commit and
> > > > > clean
> > > > > > up files that actually belong to the committed snapshot, causing
> > > > > > catalog/storage inconsistency.
> > > > > >
> > > > > > *The Proposed Solution*
> > > > > > Accept Idempotency-Key on mutation routes. The server binds the
> key
> > > to
> > > > a
> > > > > > canonical payload hash and enforces:
> > > > > >
> > > > > >    -
> > > > > >
> > > > > >    *Same key + same payload* -> return the original result (no
> > > > > >    re-execution).
> > > > > >    -
> > > > > >
> > > > > >    *Same key + different payload* -> 422 Unprocessable Content.
> > > > > >    -
> > > > > >
> > > > > >    *In-flight duplicate* -> 409 Conflict.
> > > > > >    Transient 5xx are never cached.
> > > > > >
> > > > > > *Discovery & Compatibility*
> > > > > > When serving Iceberg REST, Polaris can advertise support and
> > > retention
> > > > > via
> > > > > > GET
> > > > > > /v1/config, e.g.:
> > > > > >
> > > > > > { "properties": { "idempotency-key-supported": "true",
> > > > > >                   "idempotency-key-lifetime": "PT30M" } }
> > > > > >
> > > > > > Fully backward compatible: servers may ignore the header; clients
> > > > enable
> > > > > > retries only when discovery indicates support.
> > > > > >
> > > > > > This is the *server-side counterpart* to the Iceberg REST Catalog
> > > > > > Idempotency proposal
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0#heading=h.jfecktgonj1i
> > > > > > >
> > > > > > I sent to the Iceberg community.
> > > > > > Here is the Polaris proposal
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1L5A8Cspugsk1dW4Ij4dy5wB5FKa8KnaXT0LHBJdgq9w/edit?usp=sharing
> > > > > > >.
> > > > > > Feedback welcome!
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Huaxin
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to