Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Robert Stupp Sat, 30 May 2026 03:15:34 -0700

Hi all,

Thanks for the clarifications. Russell's explanation is especially useful.
I agree, ambiguous request outcomes, for example, timeouts or network
connections being reset, are hard to reason about.


Clients often cannot reliably reconcile from the current state alone for
table/view state mutating operations.

I wonder whether the idempotency key should be recorded in the table/view
metadata as an "operation-id", with an explicit retention guarantee, maybe
tied to a server-provided minimum TTL.
The approach could reduce or change the role of a separate
idempotency-record table and handling of it.

Request handling could roughly look like this:
  if the current history/metadata already contains that "operation-id",
    return equivalent-enough response without re-running the operation.

  try the committing operation:
  if the commit succeeds:
    record the "operation-id" in the table/view metadata, and
    return the successful response.
  if the commit runs into a conflict:
    re-check whether the current metadata/history contains that
"operation-id"
    if so:
      return equivalent-enough response.
    otherwise:
      return the conflict response.

This is not perfect either and needs spec work, retention rules, and may
only work for table and view operations.

I mostly want to separate the questions:
1. What guarantees do clients actually need after an ambiguous outcome?
2. Where should the durable evidence for the guarantee live?

Robert

On Sat, May 30, 2026 at 4:30 AM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi Russell,
>
> Thanks for the information! It clarifies the use case a lot (at least for
> me :)
>
> In short, I'd say the main benefit is allowing clients to avoid conflicts
> (409) on re-submitting changes that got committed by the server without the
> client receiving confirmation of the success.
>
> I believe the Iceberg REST Catalog spec [1] is formally stricter than Model
> B when it states "the server ensures no additional effects for requests
> that carry the same Idempotency-Key". Since Model B permits request
> re-execution, the possibility of additional side effects cannot be ruled
> out completely based on the proposed server-side algorithm alone. The
> server must assume that the client forms the (change) request in such a way
> that only one execution attempt can succeed (e.g. by using "update
> requirements"). This is also mentioned in  comments on the doc [2].
>
> This is probably worth mentioning in the Polaris docs related to
> our Idempotency-Key implementation.
>
> Assuming this kind of cooperation on the client side, I believe Model B can
> be considered compliant with the spec [1].
>
> In anticipation of fresh implementation PRs for this feature, I'd like to
> re-emphasize (IIRC I mentioned this before) that, I think, we should avoid
> coupling Idempotency persistence with MetaStore persistence (both code-wise
> and transaction-wise). Model B processes Idempotency-related data outside
> the original change request's execution scope. Idempotency decisions are
> made either before the request starts executing or after it is committed to
> the MetaStore.
>
> [1]
>
> https://github.com/apache/polaris/blob/4e4eaf840bf71d431b13034b0dd6f338261d8e8b/spec/iceberg-rest-catalog-open-api.yaml#L2098
>
> [2]
>
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
>
> Cheers,
> Dmitri.
>
> On Fri, May 29, 2026 at 8:26 PM Russell Spitzer <[email protected]
> >
> wrote:
>
> > The problem with a client attempting to determine if it’s operations
> > succeeded via  load table, and the reason all this work has proceeded, is
> > that there is no way for a client to guaranteed path to actually
> determine
> > if a commit occurred. There are too many legitimate mechanisms to erase
> > history from an Iceberg table to guarantee an operation occurred.
> >
> > For example, you could check if your snapshot exists in snapshot history
> > but this could have been erased by expire snapshots.
> >
> > Or you could check if the schema was modified according to your update,
> but
> > this too could have been undone by another operation. Client A adds
> column
> > but gets time out, Client B removes the Column, Client A retries and adds
> > the column again.
> >
> > Because of this the Iceberg client usually just bails out to he user with
> > an exception if it doesn’t get an actual confirmation that the commit
> > succeeded from the server. This leaves the “can I retry or not” as an
> > exercise to the end user.
> >
> > In practice, actual Iceberg users work around this sort of thing by
> adding
> > all sorts of custom metadata to hopefully persist history in the table
> > itself in some way that can’t be touched by expire snapshots, but this is
> > usually very fragile and also relies on all clients behaving well. I’ve
> > seen folks use custom table properties for example “batch-5: committed”
> > then manually have their own retry logic check whether this property is
> > set. Then, of course, they also have to add a bunch custom logic to make
> > sure they clean up this state as well.
> >
> > This is why Iceberg added the Idempotency path in the first place, it
> gives
> > us a guaranteed way for clients to retry in case of a network issue or
> > catalog issue with a guarantee they will not do duplicate work be
> retrying.
> > With this in place the client can now cleanly retry (within the
> idempotency
> > window) the same operation over and over without throwing an exception to
> > the end user. Only in a situation where the catalog cannot respond over a
> > very long time will the user actually have to do some sort of
> > reconciliation. You can look at the history of the Iceberg client’s retry
> > behavior with ambiguous server side or network errors to see how this has
> > been a problem in the past.
> >
> > On Fri, May 29, 2026 at 1:24 PM huaxin gao <[email protected]>
> wrote:
> >
> > > Hi Robert,
> > >
> > > Thanks for your reply!
> > >
> > > You're right that Model B does not prevent duplicate execution. The
> > > record is written only after success. So if a client times out while
> the
> > > first request is still running, a retry can run the handler again.
> There
> > > is no record yet to stop it. So Model B is "remember and replay a
> > > successful result," not "run exactly once."
> > >
> > > On the trade-off: Model A gives a stronger guarantee, but it needs
> > > reserve/heartbeat/purge state, which adds complexity and overhead.
> Model
> > > B is simpler and cheaper. The window it leaves open is small, and a
> > > client only retries after a timeout, so racing first requests should be
> > > rare in practice. Every design is a trade-off, and my view is that
> Model
> > > B is the right one here.
> > >
> > > It also helps to be clear about where duplicate-work protection really
> > > comes from. It comes from the catalog itself, not from idempotency. The
> > > catalog uses optimistic concurrency. If wo first attempts race, at most
> > > one commit wins and the other gets a 409. Idempotency sits on top of
> > that.
> > > It does not replace it.
> > >
> > > So what does Model B add over "the client just calls loadTable and
> > > reconciles"? Two things that I think are real:
> > >
> > >   1. The 422 check. loadTable can tell a client that a table exists. It
> > >      cannot tell the client that the table THEY created with THIS key
> is
> > >      the one that succeeded. The record binds the key to (principal,
> > >      operation, resource). If the same key is reused for a different
> > >      request, the server returns 422. The client cannot detect this on
> > >      its own.
> > >
> > >   2. One server-side behavior for all mutating ops. create-table
> happens
> > >      to reconcile cleanly with loadTable. But the point of the
> > >      Idempotency-Key header is that the client should not have to write
> > >      reconciliation logic for every operation. For a known key, the
> > >      server turns what would be a 409 into an equivalent 2xx replay.
> The
> > >      client gets a clean success instead of an error it has to special-
> > >      case.
> > >
> > > There is a third, weaker benefit: once a record exists, retries stop
> > > seeing flip-flopping results. But that only helps after a record
> exists,
> > > which is exactly the window you pointed out is unprotected.
> > >
> > > So I'll correct my earlier wording. This is not convergence on exactly-
> > > once idempotency. It is a narrower guarantee: replay a recorded result,
> > > plus detect key misuse. It sits on top of the catalog's existing
> > > concurrency control. The real question for the list is simple: is that
> > > narrower guarantee worth shipping on its own? Or do we need Model A's
> > > in-flight protection to have a strong idempotency guarantee?
> > >
> > > My view is that the narrow version is worth it for now: it's the
> > > behavior the spec asks for, the 422 check can't be done client-side,
> and
> > > it's a small change we can strengthen toward Model A later if a real
> use
> > > case needs it. Happy to hear what others think.
> > >
> > > Best,
> > > Huaxin
> > >
> > > On Fri, May 29, 2026 at 7:36 AM Robert Stupp <[email protected]> wrote:
> > >
> > > > Hi Huaxin,
> > > >
> > > > Thanks for writing this up and moving the design discussion back to
> > dev@
> > > .
> > > >
> > > > Since you’re asking before locking in the implementation, I think we
> > > should
> > > > clarify one point.
> > > >
> > > > Model B is certainly simpler than the lease-based approach, but I’m
> not
> > > > sure I fully understand what problem it still solves.
> > > >
> > > > As I read it, if a client times out while the original request is
> still
> > > > running, a retry with the same key may not see an idempotency record
> > yet
> > > > and could run the handler again.
> > > > So this feels less like preventing duplicate execution and more like
> > > > remembering a successful result after the fact.
> > > >
> > > > For the create-table case, couldn’t a client achieve roughly the same
> > > > recovery by calling loadTable after an ambiguous timeout and
> > reconciling
> > > > from there?
> > > > Since Model B also rebuilds the response from current catalog state,
> > I’m
> > > > trying to understand what it gives us beyond that.
> > > >
> > > > I’m not against simplifying the design, but I think we should be
> clear
> > > > about the narrower guarantee before calling this convergence.
> > > >
> > > > Best,
> > > > Robert
> > > >
> > > >
> > > > On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've simplified the proposed design for Idempotency-Key support in
> > > > Polaris
> > > > > (Iceberg REST spec — retries with the same key must not produce
> > > > additional
> > > > > side effects), and I'd like a wider review before updating the
> > > > > implementation PR (#4269 <
> > https://github.com/apache/polaris/pull/4269
> > > >).
> > > > >
> > > > > What changed
> > > > >
> > > > >   - Before (Model A, lease-based): reserve an idempotency row
> before
> > > > doing
> > > > > work → IN_PROGRESS / heartbeat → finalize after.
> > > > >   - After (Model B, optimistic commit): run the handler first →
> > record
> > > > only
> > > > > after a successful (2xx) outcome. The record stores binding +
> status,
> > > not
> > > > > the HTTP response body. Retries with the same key re-derive an
> > > equivalent
> > > > > response from current catalog state
> > > > >     instead of replaying a stored payload.
> > > > >
> > > > > The design doc still compares Model A and Model B side-by-side so
> the
> > > > > trade-offs are explicit. So far the discussion has been leaning
> > toward
> > > > > Model B — mutating REST operations only, 2xx-only persistence, no
> > > > > response-body storage, and the known
> > > > > trade-offs (e.g. concurrent first-request races; see the NOTES
> > section
> > > in
> > > > > the doc).
> > > > >
> > > > > Does this direction look right before we lock in the
> implementation?
> > > > >
> > > > > Comments on the doc
> > > > > <
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> > > > > >
> > > > > or replies on this thread both work.
> > > > >
> > > > > Thanks,
> > > > > Huaxin
> > > > >
> > > >
> > >
> >
>

Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Reply via email to