Thanks Robert and Dmitri. Sorry for the confusion — cross-mail-race-conditions definitely happen. To keep things simple, I will continue the discussion in the original idempotency thread. Thanks again for the feedback and discussion!
Huaxin On Mon, Dec 8, 2025 at 2:30 PM Robert Stupp <[email protected]> wrote: > Cross-mail-race-conditions happen ;) > > On Mon, Dec 8, 2025 at 7:53 PM Dmitri Bourlatchkov <[email protected]> > wrote: > > > > Sorry, Robert, it might be my fault for causing this new thread. My > > intention was to have a more focused discussion about java code changes > > here. > > > > Cheers, > > Dmitri. > > > > On Mon, Dec 8, 2025 at 5:57 AM Robert Stupp <[email protected]> wrote: > > > > > Hi, > > > > > > ah, just saw this email thread, but I replied to the other discussion > > > thread earlier. > > > Can we consider discussing this topic in just one thread? > > > > > > On Sun, Dec 7, 2025 at 2:19 AM huaxin gao <[email protected]> > wrote: > > > > > > > > Hi Dmitri, > > > > > > > > Thanks a lot for the review and for calling this out! You’re right > that > > > PR > > > > 3205 only shows the persistence SPI, so the cross‑node behavior isn’t > > > > obvious from the code alone.I’ve updated the design doc with a new > > > section > > > > “Multi‑node Coordination, Liveness, and Cleanup” that spells this > out, > > > but > > > > let me summarize the answers to your questions here: > > > > > > > > Retry hits a different node while the original is still running > > > > > > > > - Each request with an Idempotency-Key goes through an HTTP > > > idempotency > > > > filter (IdempotencyFilter) that calls > > > > > > > > reserve(realmId, key, operationType, normalizedResourceId, expiresAt, > > > > executorId, now). > > > > > > > > > > > > - The Postgres implementation uses INSERT … ON CONFLICT DO NOTHING > > > > on (realm_id, > > > > idempotency_key), so only one node wins globally (OWNED); all > others > > > see > > > > DUPLICATE + the existing record. > > > > - The winning node is the owner and periodically calls > > > > updateHeartbeat(...) while the record is IN_PROGRESS, which keeps > > > > heartbeat_at fresh. > > > > - When a duplicate lands on another node, it looks at heartbeat_at > > > and a > > > > short lease (lease_ttl, e.g. ~25s): if now − heartbeat_at ≤ > lease_ttl, > > > > the original owner is treated as active. The duplicate does not > > > re‑execute > > > > the mutation; it waits (with a small bounded poll) for the record > to > > > become > > > > FINALIZED, then replays the stored result. > > > > > > > > Original node crashes without finalizing the record > > > > > > > > - In that case we still have an IN_PROGRESS row with an old > > > heartbeat_at > > > > and the original executor_id. > > > > - After lease_ttl has passed, subsequent requests with the same > key > > > see > > > > IN_PROGRESS and a stale heartbeat (now − heartbeat_at > > lease_ttl), so > > > > the filter treats the owner as dead and invokes a follow‑up > Reconciler > > > > SPI. > > > > - The Reconciler does “finalize‑gap” reconciliation: it inspects > the > > > > catalog state for the (operationType, resourceId, realmId) to > decide > > > > whether the original mutation actually landed. > > > > - If the state already reflects the intended change, it can > mark > > > the > > > > record FINALIZED and allow duplicates to replay an equivalent > > > success > > > > response without re‑executing the commit. > > > > - If the state does not reflect the change, it can re‑execute > once > > > on > > > > the new node (“takeover”). > > > > - The current PR just introduces the fields needed for this > > > (executor_id, > > > > heartbeat_at, http_status, finalized_at, response_*); the filter + > > > > Reconciler wiring comes in a follow‑up change. > > > > > > > > Cleanup of old idempotency records / collaboration across nodes > > > > > > > > - On first acceptance we set expires_at = firstAcceptance + > > > > idempotency-key-lifetime + grace. This is a long‑lived retention > > > > horizon, separate from the short liveness lease: > > > > - lease_ttl (seconds) → “is the current owner still alive?” > > > > - expires_at (minutes/hours+) → “how long must we remember > this key > > > > for duplicates/replay and when can we delete it?” > > > > - The SPI exposes purgeExpired(before) which is implemented as > DELETE > > > > FROM idempotency_records WHERE expires_at < ?. Any node (or a > > > > maintenance job) can call this periodically. The delete is > idempotent > > > and > > > > keyed only by time, so multiple nodes can run cleanup concurrently > > > without > > > > coordination; it’s intentionally a collaborative, best‑effort > process. > > > > > > > > Thanks, > > > > Huaxin > > > > > > > > > > > > On Fri, Dec 5, 2025 at 4:56 PM Dmitri Bourlatchkov <[email protected] > > > > > wrote: > > > > > > > > > Hi Huaxin, > > > > > > > > > > I looked at the PR briefly. It appears to add a persistence SPI for > > > storing > > > > > information related to the processing of Idempotency Keys. > > > > > > > > > > I do not have any particular comment about that interface itself > ATM, > > > but I > > > > > suppose it will be used (eventually) in conjunction with > coordinating > > > > > request processing in multiple Polaris server nodes. This aspect > is not > > > > > apparent in PR 3205 and IIRC it is not discussed in depth in the > > > proposal > > > > > doc (please correct me if I missed that). The updateHeartbeat() > method > > > is > > > > > declared, but how it is supposed to be used is not clear to me. > > > > > > > > > > So, in order to better understand how the proposed persistence SPI > can > > > work > > > > > end-to-end, could you expand on your proposal for coordinating > retry > > > > > processing across nodes and how that will relate to persistence? > > > > > > > > > > Specifically, I mean the case when a retry is submitted by a > client to > > > one > > > > > node while another node is still processing the original request. > > > Another > > > > > case is when the original request node crashes without updating the > > > > > persisted data. > > > > > > > > > > Additionally, what is your vision regarding the cleanup of old > > > Idempotency > > > > > records? Is this a collaborative process where all server nodes > > > participate > > > > > equally? > > > > > > > > > > Thanks, > > > > > Dmitri. > > > > > > > > > > On Thu, Dec 4, 2025 at 4:53 PM huaxin gao <[email protected]> > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Following the earlier idempotency proposal, I’ve opened a PR > > > > > > <https://github.com/apache/polaris/pull/3205> that adds the > first > > > > > > implementation of idempotency retries to Polaris core and would > like > > > > > > feedback on the Java design. > > > > > > > > > > > > The PR introduces: > > > > > > > > > > > > - > > > > > > > > > > > > org.apache.polaris.core.persistence.IdempotencyStore – an SPI > to > > > > > > reserve/load/finalize/purge idempotency records (with > > > ReserveResult / > > > > > > ReserveResultType). > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > org.apache.polaris.idempotency.IdempotencyRecord – the value > type > > > for > > > > > > stored idempotency metadata. > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > PostgresIdempotencyStore + PostgresIdempotencyStoreIT – a > > > > > > relational‑jdbc implementation backed by > > > > > > POLARIS_SCHEMA.idempotency_records, tested via > > > > > Testcontainers/Postgres. > > > > > > > > > > > > Please take a look at the PR and reply here or comment on the > change > > > with > > > > > > any concerns or suggestions. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Huaxin > > > > > > > > > > > > > > >
