Hi, ah, just saw this email thread, but I replied to the other discussion thread earlier. Can we consider discussing this topic in just one thread?
On Sun, Dec 7, 2025 at 2:19 AM huaxin gao <[email protected]> wrote: > > Hi Dmitri, > > Thanks a lot for the review and for calling this out! You’re right that PR > 3205 only shows the persistence SPI, so the cross‑node behavior isn’t > obvious from the code alone.I’ve updated the design doc with a new section > “Multi‑node Coordination, Liveness, and Cleanup” that spells this out, but > let me summarize the answers to your questions here: > > Retry hits a different node while the original is still running > > - Each request with an Idempotency-Key goes through an HTTP idempotency > filter (IdempotencyFilter) that calls > > reserve(realmId, key, operationType, normalizedResourceId, expiresAt, > executorId, now). > > > - The Postgres implementation uses INSERT … ON CONFLICT DO NOTHING > on (realm_id, > idempotency_key), so only one node wins globally (OWNED); all others see > DUPLICATE + the existing record. > - The winning node is the owner and periodically calls > updateHeartbeat(...) while the record is IN_PROGRESS, which keeps > heartbeat_at fresh. > - When a duplicate lands on another node, it looks at heartbeat_at and a > short lease (lease_ttl, e.g. ~25s): if now − heartbeat_at ≤ lease_ttl, > the original owner is treated as active. The duplicate does not re‑execute > the mutation; it waits (with a small bounded poll) for the record to become > FINALIZED, then replays the stored result. > > Original node crashes without finalizing the record > > - In that case we still have an IN_PROGRESS row with an old heartbeat_at > and the original executor_id. > - After lease_ttl has passed, subsequent requests with the same key see > IN_PROGRESS and a stale heartbeat (now − heartbeat_at > lease_ttl), so > the filter treats the owner as dead and invokes a follow‑up Reconciler > SPI. > - The Reconciler does “finalize‑gap” reconciliation: it inspects the > catalog state for the (operationType, resourceId, realmId) to decide > whether the original mutation actually landed. > - If the state already reflects the intended change, it can mark the > record FINALIZED and allow duplicates to replay an equivalent success > response without re‑executing the commit. > - If the state does not reflect the change, it can re‑execute once on > the new node (“takeover”). > - The current PR just introduces the fields needed for this (executor_id, > heartbeat_at, http_status, finalized_at, response_*); the filter + > Reconciler wiring comes in a follow‑up change. > > Cleanup of old idempotency records / collaboration across nodes > > - On first acceptance we set expires_at = firstAcceptance + > idempotency-key-lifetime + grace. This is a long‑lived retention > horizon, separate from the short liveness lease: > - lease_ttl (seconds) → “is the current owner still alive?” > - expires_at (minutes/hours+) → “how long must we remember this key > for duplicates/replay and when can we delete it?” > - The SPI exposes purgeExpired(before) which is implemented as DELETE > FROM idempotency_records WHERE expires_at < ?. Any node (or a > maintenance job) can call this periodically. The delete is idempotent and > keyed only by time, so multiple nodes can run cleanup concurrently without > coordination; it’s intentionally a collaborative, best‑effort process. > > Thanks, > Huaxin > > > On Fri, Dec 5, 2025 at 4:56 PM Dmitri Bourlatchkov <[email protected]> wrote: > > > Hi Huaxin, > > > > I looked at the PR briefly. It appears to add a persistence SPI for storing > > information related to the processing of Idempotency Keys. > > > > I do not have any particular comment about that interface itself ATM, but I > > suppose it will be used (eventually) in conjunction with coordinating > > request processing in multiple Polaris server nodes. This aspect is not > > apparent in PR 3205 and IIRC it is not discussed in depth in the proposal > > doc (please correct me if I missed that). The updateHeartbeat() method is > > declared, but how it is supposed to be used is not clear to me. > > > > So, in order to better understand how the proposed persistence SPI can work > > end-to-end, could you expand on your proposal for coordinating retry > > processing across nodes and how that will relate to persistence? > > > > Specifically, I mean the case when a retry is submitted by a client to one > > node while another node is still processing the original request. Another > > case is when the original request node crashes without updating the > > persisted data. > > > > Additionally, what is your vision regarding the cleanup of old Idempotency > > records? Is this a collaborative process where all server nodes participate > > equally? > > > > Thanks, > > Dmitri. > > > > On Thu, Dec 4, 2025 at 4:53 PM huaxin gao <[email protected]> wrote: > > > > > Hi all, > > > > > > Following the earlier idempotency proposal, I’ve opened a PR > > > <https://github.com/apache/polaris/pull/3205> that adds the first > > > implementation of idempotency retries to Polaris core and would like > > > feedback on the Java design. > > > > > > The PR introduces: > > > > > > - > > > > > > org.apache.polaris.core.persistence.IdempotencyStore – an SPI to > > > reserve/load/finalize/purge idempotency records (with ReserveResult / > > > ReserveResultType). > > > > > > > > > - > > > > > > org.apache.polaris.idempotency.IdempotencyRecord – the value type for > > > stored idempotency metadata. > > > > > > > > > - > > > > > > PostgresIdempotencyStore + PostgresIdempotencyStoreIT – a > > > relational‑jdbc implementation backed by > > > POLARIS_SCHEMA.idempotency_records, tested via > > Testcontainers/Postgres. > > > > > > Please take a look at the PR and reply here or comment on the change with > > > any concerns or suggestions. > > > > > > Thanks, > > > > > > Huaxin > > > > >
