Hi Robert,
Quick follow‑up to my mail from yesterday: I’ve just updated the proposal text to incorporate your comments. In particular: I clarified the finalization rules so the server MUST NOT return 2xx if commit/update preconditions (expected base snapshot, requested schema changes, etc.) are not satisfied; in those cases the handler returns an appropriate 4xx, and that 4xx is what the idempotency layer finalizes and replays. I added an explicit replay‑failure path: if a previously finalized result can no longer be reproduced (e.g., table dropped and metadata purged), the server returns a 5xx with subtype idempotency_replay_failed and does not try to re‑run the old mutation. Under Multi‑node Coordination, I wrote down the stale‑lease/reconciliation behavior more concretely and noted that pod restarts/crashes show up as missing heartbeats; duplicates then see a stale lease, run reconciliation once, and either return the original result or a 503 rather than waiting indefinitely. I added a short Failure modes note after the IdempotencyStore SPI describing how we handle a pluggable backend that is down/slow: coordination‑critical paths fail fast with a defined 5xx, while heartbeat/finalize are best‑effort so Polaris itself doesn’t get stuck. I also tightened the Non‑Goals section to state explicitly that this iteration only targets the built‑in Polaris Iceberg REST catalog, not federated or non‑IRC APIs. Happy to tweak the wording further if you think any of these areas still need more precision. Best, Huaxin On Mon, Dec 8, 2025 at 8:50 PM huaxin gao <[email protected]> wrote: > Hi Robert,Thanks a lot for taking the time to write such a detailed note — > I really appreciate the careful review and the references back to the > Iceberg spec. I agree this cuts across API, persistence, and > distributed‑systems concerns, so we need to get the design right before we > treat it as “done”.On a few of your main points: > > - Scope & Iceberg semantics > > The “key‑only semantics” phrase is my shorthand, not Iceberg wording. What > I meant is: in the earlier Iceberg mailing‑list discussion we converged on > baseline key‑only idempotency aimed at low‑level/network retries, with no > payload fingerprinting in the protocol. Servers treat Idempotency-Key as > an opaque token bound to a single operation/resource/realm; if we ever > explore payload‑binding, that would be a separate follow‑up discussion, not > part of the current design. > > > - Key vs fingerprinting > > I agree that a bare Idempotency-Key + entity identifier is not enough to > protect against buggy or malicious clients; fingerprinting the full request > would be stronger. For v1 I was trying to stay aligned with the Iceberg > REST spec (client‑supplied key, no payload fingerprint in the contract) and > keep the server implementation simple, but I’ll add a section that: > > > - calls out the risk you describe (two different logical requests > reusing the same key), > - spells out the binding we do enforce (key + operation type + > resource + realm), and > - treats request‑fingerprinting as a possible follow‑on enhancement > rather than something we’re silently ignoring. > - Multi‑pod, liveness and “followers waiting forever” > > I’ve expanded the design doc to describe the heartbeat/lease mechanism and > the behavior when the primary pod dies. In short: we don’t let followers > wait unboundedly. Each owner periodically calls updateHeartbeat, and > duplicates only wait while now − heartbeat_at is within a short lease > window; once that lease expires, we hand control to a reconciliation step > rather than continuing to block. I’ll make sure this algorithm and its > failure modes (including timeouts/back‑pressure limits) are written down > more rigorously, not just implied in the text. > > > - Backend pluggability and failure > > I agree that if the idempotency backend is pluggable, the design has to > cover the backend down / nuked explicitly so Polaris doesn’t just hang. I > will add a note in the design doc if the idempotency backend is unavailable > we must fail requests in a bounded way (not hang), and treat > hartbeat/finalize as best-effort so Polaris doesn't get stuck. > > > - Quarkus collaboration and scope > > I like the idea of collaborating with the Quarkus community on a more > generic JAX‑RS idempotency layer, and I agree there’s nothing inherently > “Polaris‑only” about many of these concerns. For the moment I’d still like > to keep this proposal scoped to the Polaris REST catalog (IRC) so we can > converge on concrete semantics there first, but I’ll add a short “future > work” section that talks about factoring out the generic pieces and > exploring Quarkus integration once we have agreement on the core behavior. > > Best, > Huaxin > > > > > On Mon, Dec 8, 2025 at 2:23 AM Robert Stupp <[email protected]> wrote: > >> Hi, >> >> > Spec alignment: Iceberg chose key‑only semantics (no payload >> fingerprinting) >> >> I do not see this "key-only semantics" mentioned anywhere in the >> Iceberg spec [1]. >> The Iceberg spec requirement "The idempotency key must be globally >> unique" [1] OTOH is impossible for a client to guarantee. >> >> It is "relatively easy" for clients to implement some retry mechanism >> and add some HTTP header (it is probably also not easy for clients to >> implement properly, see all the issues that happened in the past wrt >> when to (not) throw a CommitFailedException). >> It is definitely a complex task for servers. >> >> This feature touches many application, HTTP, security and distributed >> systems aspects and we should be very careful. >> I'd like to repeat my proposal to collaborate with the Quarkus >> community, because they have extensive knowledge about all these >> things and are very open to collaboration. I do not see any >> "specialties" that are unique to Polaris and force us to come up with >> our very own implementation. >> >> In any case, we should first design this functionality very carefully. >> Consider all use cases, the potential logical and technical states, >> exceptions, race conditions and failure scenarios. >> After we have consensus on all that, we can move on to the code. >> >> Some comments around the design and the feature itself: >> * The multi-table-commit endpoint doesn't seem to fit into the design >> (many "resources" not one)? >> * A resource has been deleted before the current state can be served >> ("delete" succeeds before the "follower of an update" finishes). The >> idempotent-request code would yield "serve this table-metadata" - but >> it cannot, as the metadata has been purged. >> * Two "idempotent requests" yielding different results, racing with >> another. While that's mentioned in the Iceberg spec as "response body >> *may* reflect a newer state", I am not convinced this is what all >> clients can cope with. For example, a client that adds a column to the >> schema expects that column to be present upon successful request >> completion. But the "response body may reflect a newer state" >> exception means that an "add column" operation can legitably yield a >> schema without the added column. Similar for column type changes, >> column removals, extending to sort-orders and partition-specs and lots >> more. This can lead to subtle issues in all clients and query engines. >> Isn't it a server-bug yielding a success-response to an Iceberg >> update-table request with non-fulfillable update-request-requirements? >> * "Sole idempotency-key + entity-identifier" is not enough. Two >> identical idempotency keys, either intentional or due to a bug, would >> lead to wrong responses. This would _not_ be a problem with >> fingerprinting _all_ inputs for the operation. All existing >> implementations and experience reports/posts that I could find do >> request fingerprinting, including the request body. >> * It is unclear whether this feature is intended for the "built in" >> Polaris Iceberg catalog or does it include federated catalogs? Is it >> useful for non-IRC APIs? >> >> Some comments around the technical things. All these could be >> "offloaded" to Quarkus, leveraging async event loop processing and >> circuit breaking: >> * If a pod executing the "primary" request dies, do "followers" wait >> "forever"? What is the actual distributed algorithm to resolve this >> without having a lot of threads spinning for a long time? This can >> happen for buggy clients, bad client configurations or intentionally >> bad clients. >> * Technical failure and rolling-restart/upgrade scenarios should be >> considered. >> * As the "idempotent request coordination backend" is pluggable, the >> design should also consider the case that the backend state is nuked, >> becomes unresponsive or fails in the meantime. We should avoid failing >> Polaris if this subsystem fails, or letting this subsystem be a reason >> for its existence (aka retry due to timeouts because the >> idempotent-request subsystem hangs). >> >> Robert >> >> [1] >> https://github.com/apache/iceberg/blob/19b4bd024486d9d516d0e547e273419c1bc7074e/open-api/rest-catalog-open-api.yaml#L1933-L1961 >> >> On Wed, Nov 26, 2025 at 7:08 PM huaxin gao <[email protected]> >> wrote: >> > >> > Thanks Robert for the thoughtful note! >> > >> > A generic JAX‑RS/Quarkus idempotency layer would be useful broadly, and >> > Quarkus’s distributed cache is a good building block. For Polaris, >> though, >> > we need a few things that go beyond caching or generic locking: >> > >> > >> > - No external lock service: we use an atomic “first‑writer‑wins” >> reserve >> > via a unique key in durable storage (single upsert), so exactly one >> node >> > owns a key; others see the existing row. >> > - Spec alignment: Iceberg chose key‑only semantics (no payload >> > fingerprinting). Safety comes from first‑acceptance plus binding >> > {operationType, resourceId, realm}; mismatched reuse -> 422; >> duplicates do >> > not re‑execute. >> > - Liveness and failover: heartbeat/lease while IN_PROGRESS and >> > reconciliation on stale owners (finalize‑gap/takeover) so duplicates >> don’t >> > block indefinitely and we avoid double execution. >> > - Durable replay: persist a minimal, equivalent response (not >> > in‑memory/TTL cache) and a clear status policy (finalize only >> 2xx/terminal >> > 4xx; never 5xx). >> > >> > Phase I: I’ll focus on implementing this in Polaris behind a small >> > storage‑agnostic SPI and wiring the flows. >> > >> > Phase II: we can revisit extracting the core into a reusable >> JAX‑RS/Quarkus >> > module, but for now I’d like to keep the scope on shipping Polaris v1. >> > Thanks, >> > Huaxin >> > >> > On Tue, Nov 25, 2025 at 11:18 PM Robert Stupp <[email protected]> wrote: >> > >> > > Hi all, >> > > >> > > To build an idempotent service, it seems necessary to consider some >> > > things, naming a few: >> > > * distributed locking, resilient to failure scenarios >> > > * distributed caching >> > > * request fingerprinting >> > > * request failure scenarios >> > > >> > > I think a generic JAX-RS idempotency functionality would be beneficial >> > > not just for Polaris. >> > > I can imagine that the Quarkus project would be very interested in >> > > such a thing. For example, Quarkus already has functionality for >> > > distributed caching in place, which is a building block for idempotent >> > > responses. >> > > Have we considered joining forces with them and leveraging synergies? >> > > >> > > Robert >> > > >> > > On Wed, Nov 26, 2025 at 4:57 AM huaxin gao <[email protected]> >> wrote: >> > > > >> > > > Hi Dmitri, >> > > > >> > > > Thanks for the reply and the detailed comments in the proposal. >> You’re >> > > > right: the goal is to implement the recently approved Iceberg >> > > > Idempotency-Key spec, and we don’t plan any additional REST Catalog >> API >> > > > changes in Polaris. I’ve refocused the proposal on the server-side >> > > > implementation and agree we should land the REST Catalog work >> first, then >> > > > extend to the Management API. >> > > > >> > > > I addressed your inline comments and added a small, backend-agnostic >> > > > Idempotency Persistence API (reserve/load/heartbeat/finalize/purge) >> so it >> > > > works across all storage backends (Postgres first). >> > > > >> > > > On the async tasks framework: agreed — there are synergies. I’ll >> keep >> > > this >> > > > in mind and align the idempotency store semantics with the async >> tasks >> > > > model. >> > > > Best, >> > > > Huaxin >> > > > >> > > > On Tue, Nov 25, 2025 at 12:21 PM Dmitri Bourlatchkov < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Hi Huaxin, >> > > > > >> > > > > Thanks for resuming this proposal! >> > > > > >> > > > > In general, I suppose the intention is to implement the recently >> > > approved >> > > > > Iceberg REST Catalog spec change for Idempotency Keys. With that >> in >> > > mind, I >> > > > > believe the Polaris proposal probably needs to be more focused on >> the >> > > > > server side implementation now that the API spec has been >> finalized. I >> > > do >> > > > > not think Polaris needs any other API changes in the REST Catalog >> on >> > > top of >> > > > > the Iceberg spec. >> > > > > >> > > > > I'd propose to deal with the REST Catalog API first and then >> extend to >> > > the >> > > > > Management API (for the sake of simplicity). >> > > > > >> > > > > I added some more specific comments in the doc, but overall, I >> believe >> > > we >> > > > > need to consider what needs to be changed in the java Persistence >> API >> > > in >> > > > > Polaris because the idempotency feature probably applies to all >> > > backends. >> > > > > >> > > > > Also, as I commented [1] in earlier emails about this proposal, I >> > > believe >> > > > > some synergies can be found with the async tasks framework [2]. >> The >> > > main >> > > > > point here is orchestrating request execution among a set of >> > > distributed >> > > > > server nodes. >> > > > > >> > > > > [1] >> https://lists.apache.org/thread/28hx9kl4qmm5sho8jxmjlt6t0cd0hn6d >> > > > > >> > > > > [2] >> https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l >> > > > > >> > > > > Cheers, >> > > > > Dmitri. >> > > > > >> > > > > >> > > > > On Sat, Nov 22, 2025 at 7:50 PM huaxin gao < >> [email protected]> >> > > wrote: >> > > > > >> > > > > > Hi all, >> > > > > > I would like to restart the discussion on Idempotency-Key >> support in >> > > > > > Polaris. This proposal focuses on Polaris server-side behavior >> and >> > > > > > implementation details, with the Iceberg spec as the baseline >> API >> > > > > contract. >> > > > > > Thanks for your review and feedback. >> > > > > > >> > > > > > Polaris Idempotency Key Proposal >> > > > > > < >> > > > > > >> > > > > >> > > >> https://docs.google.com/document/d/1ToMMziFIa7DNJ6CxR5RSEg1dgJSS1zFzZfbngDz-EeU/edit?tab=t.0#heading=h.ecn4cggb6uy7 >> > > > > > > >> > > > > > >> > > > > > Iceberg Idempotency Key Proposal >> > > > > > < >> > > > > > >> > > > > >> > > >> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0#heading=h.jfecktgonj1i >> > > > > > > >> > > > > > >> > > > > > Best, >> > > > > > Huaxin >> > > > > > >> > > > > >> > > >> >
