Cross-mail-race-conditions happen ;)
On Mon, Dec 8, 2025 at 7:53 PM Dmitri Bourlatchkov <[email protected]> wrote: > > Sorry, Robert, it might be my fault for causing this new thread. My > intention was to have a more focused discussion about java code changes > here. > > Cheers, > Dmitri. > > On Mon, Dec 8, 2025 at 5:57 AM Robert Stupp <[email protected]> wrote: > > > Hi, > > > > ah, just saw this email thread, but I replied to the other discussion > > thread earlier. > > Can we consider discussing this topic in just one thread? > > > > On Sun, Dec 7, 2025 at 2:19 AM huaxin gao <[email protected]> wrote: > > > > > > Hi Dmitri, > > > > > > Thanks a lot for the review and for calling this out! You’re right that > > PR > > > 3205 only shows the persistence SPI, so the cross‑node behavior isn’t > > > obvious from the code alone.I’ve updated the design doc with a new > > section > > > “Multi‑node Coordination, Liveness, and Cleanup” that spells this out, > > but > > > let me summarize the answers to your questions here: > > > > > > Retry hits a different node while the original is still running > > > > > > - Each request with an Idempotency-Key goes through an HTTP > > idempotency > > > filter (IdempotencyFilter) that calls > > > > > > reserve(realmId, key, operationType, normalizedResourceId, expiresAt, > > > executorId, now). > > > > > > > > > - The Postgres implementation uses INSERT … ON CONFLICT DO NOTHING > > > on (realm_id, > > > idempotency_key), so only one node wins globally (OWNED); all others > > see > > > DUPLICATE + the existing record. > > > - The winning node is the owner and periodically calls > > > updateHeartbeat(...) while the record is IN_PROGRESS, which keeps > > > heartbeat_at fresh. > > > - When a duplicate lands on another node, it looks at heartbeat_at > > and a > > > short lease (lease_ttl, e.g. ~25s): if now − heartbeat_at ≤ lease_ttl, > > > the original owner is treated as active. The duplicate does not > > re‑execute > > > the mutation; it waits (with a small bounded poll) for the record to > > become > > > FINALIZED, then replays the stored result. > > > > > > Original node crashes without finalizing the record > > > > > > - In that case we still have an IN_PROGRESS row with an old > > heartbeat_at > > > and the original executor_id. > > > - After lease_ttl has passed, subsequent requests with the same key > > see > > > IN_PROGRESS and a stale heartbeat (now − heartbeat_at > lease_ttl), so > > > the filter treats the owner as dead and invokes a follow‑up Reconciler > > > SPI. > > > - The Reconciler does “finalize‑gap” reconciliation: it inspects the > > > catalog state for the (operationType, resourceId, realmId) to decide > > > whether the original mutation actually landed. > > > - If the state already reflects the intended change, it can mark > > the > > > record FINALIZED and allow duplicates to replay an equivalent > > success > > > response without re‑executing the commit. > > > - If the state does not reflect the change, it can re‑execute once > > on > > > the new node (“takeover”). > > > - The current PR just introduces the fields needed for this > > (executor_id, > > > heartbeat_at, http_status, finalized_at, response_*); the filter + > > > Reconciler wiring comes in a follow‑up change. > > > > > > Cleanup of old idempotency records / collaboration across nodes > > > > > > - On first acceptance we set expires_at = firstAcceptance + > > > idempotency-key-lifetime + grace. This is a long‑lived retention > > > horizon, separate from the short liveness lease: > > > - lease_ttl (seconds) → “is the current owner still alive?” > > > - expires_at (minutes/hours+) → “how long must we remember this key > > > for duplicates/replay and when can we delete it?” > > > - The SPI exposes purgeExpired(before) which is implemented as DELETE > > > FROM idempotency_records WHERE expires_at < ?. Any node (or a > > > maintenance job) can call this periodically. The delete is idempotent > > and > > > keyed only by time, so multiple nodes can run cleanup concurrently > > without > > > coordination; it’s intentionally a collaborative, best‑effort process. > > > > > > Thanks, > > > Huaxin > > > > > > > > > On Fri, Dec 5, 2025 at 4:56 PM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > > > Hi Huaxin, > > > > > > > > I looked at the PR briefly. It appears to add a persistence SPI for > > storing > > > > information related to the processing of Idempotency Keys. > > > > > > > > I do not have any particular comment about that interface itself ATM, > > but I > > > > suppose it will be used (eventually) in conjunction with coordinating > > > > request processing in multiple Polaris server nodes. This aspect is not > > > > apparent in PR 3205 and IIRC it is not discussed in depth in the > > proposal > > > > doc (please correct me if I missed that). The updateHeartbeat() method > > is > > > > declared, but how it is supposed to be used is not clear to me. > > > > > > > > So, in order to better understand how the proposed persistence SPI can > > work > > > > end-to-end, could you expand on your proposal for coordinating retry > > > > processing across nodes and how that will relate to persistence? > > > > > > > > Specifically, I mean the case when a retry is submitted by a client to > > one > > > > node while another node is still processing the original request. > > Another > > > > case is when the original request node crashes without updating the > > > > persisted data. > > > > > > > > Additionally, what is your vision regarding the cleanup of old > > Idempotency > > > > records? Is this a collaborative process where all server nodes > > participate > > > > equally? > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > On Thu, Dec 4, 2025 at 4:53 PM huaxin gao <[email protected]> > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Following the earlier idempotency proposal, I’ve opened a PR > > > > > <https://github.com/apache/polaris/pull/3205> that adds the first > > > > > implementation of idempotency retries to Polaris core and would like > > > > > feedback on the Java design. > > > > > > > > > > The PR introduces: > > > > > > > > > > - > > > > > > > > > > org.apache.polaris.core.persistence.IdempotencyStore – an SPI to > > > > > reserve/load/finalize/purge idempotency records (with > > ReserveResult / > > > > > ReserveResultType). > > > > > > > > > > > > > > > - > > > > > > > > > > org.apache.polaris.idempotency.IdempotencyRecord – the value type > > for > > > > > stored idempotency metadata. > > > > > > > > > > > > > > > - > > > > > > > > > > PostgresIdempotencyStore + PostgresIdempotencyStoreIT – a > > > > > relational‑jdbc implementation backed by > > > > > POLARIS_SCHEMA.idempotency_records, tested via > > > > Testcontainers/Postgres. > > > > > > > > > > Please take a look at the PR and reply here or comment on the change > > with > > > > > any concerns or suggestions. > > > > > > > > > > Thanks, > > > > > > > > > > Huaxin > > > > > > > > > > >
