Thanks Robert and Dmitri.
Sorry for the confusion — cross-mail-race-conditions definitely happen. To
keep things simple, I will continue the discussion in the original
idempotency thread.
Thanks again for the feedback and discussion!

Huaxin

On Mon, Dec 8, 2025 at 2:30 PM Robert Stupp <[email protected]> wrote:

> Cross-mail-race-conditions happen ;)
>
> On Mon, Dec 8, 2025 at 7:53 PM Dmitri Bourlatchkov <[email protected]>
> wrote:
> >
> > Sorry, Robert, it might be my fault for causing this new thread. My
> > intention was to have a more focused discussion about java code changes
> > here.
> >
> > Cheers,
> > Dmitri.
> >
> > On Mon, Dec 8, 2025 at 5:57 AM Robert Stupp <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > ah, just saw this email thread, but I replied to the other discussion
> > > thread earlier.
> > > Can we consider discussing this topic in just one thread?
> > >
> > > On Sun, Dec 7, 2025 at 2:19 AM huaxin gao <[email protected]>
> wrote:
> > > >
> > > > Hi Dmitri,
> > > >
> > > > Thanks a lot for the review and for calling this out!  You’re right
> that
> > > PR
> > > > 3205 only shows the persistence SPI, so the cross‑node behavior isn’t
> > > > obvious from the code alone.I’ve updated the design doc with a new
> > > section
> > > > “Multi‑node Coordination, Liveness, and Cleanup” that spells this
> out,
> > > but
> > > > let me summarize the answers to your questions here:
> > > >
> > > > Retry hits a different node while the original is still running
> > > >
> > > >    - Each request with an Idempotency-Key goes through an HTTP
> > > idempotency
> > > >    filter (IdempotencyFilter) that calls
> > > >
> > > > reserve(realmId, key, operationType, normalizedResourceId, expiresAt,
> > > > executorId, now).
> > > >
> > > >
> > > >    - The Postgres implementation uses INSERT … ON CONFLICT DO NOTHING
> > > > on (realm_id,
> > > >    idempotency_key), so only one node wins globally (OWNED); all
> others
> > > see
> > > >    DUPLICATE + the existing record.
> > > >    - The winning node is the owner and periodically calls
> > > >    updateHeartbeat(...) while the record is IN_PROGRESS, which keeps
> > > >    heartbeat_at fresh.
> > > >    - When a duplicate lands on another node, it looks at heartbeat_at
> > > and a
> > > >    short lease (lease_ttl, e.g. ~25s): if now − heartbeat_at ≤
> lease_ttl,
> > > >    the original owner is treated as active. The duplicate does not
> > > re‑execute
> > > >    the mutation; it waits (with a small bounded poll) for the record
> to
> > > become
> > > >    FINALIZED, then replays the stored result.
> > > >
> > > > Original node crashes without finalizing the record
> > > >
> > > >    - In that case we still have an IN_PROGRESS row with an old
> > > heartbeat_at
> > > >    and the original executor_id.
> > > >    - After lease_ttl has passed, subsequent requests with the same
> key
> > > see
> > > >    IN_PROGRESS and a stale heartbeat (now − heartbeat_at >
> lease_ttl), so
> > > >    the filter treats the owner as dead and invokes a follow‑up
> Reconciler
> > > >    SPI.
> > > >    - The Reconciler does “finalize‑gap” reconciliation: it inspects
> the
> > > >    catalog state for the (operationType, resourceId, realmId) to
> decide
> > > >    whether the original mutation actually landed.
> > > >       - If the state already reflects the intended change, it can
> mark
> > > the
> > > >       record FINALIZED and allow duplicates to replay an equivalent
> > > success
> > > >       response without re‑executing the commit.
> > > >       - If the state does not reflect the change, it can re‑execute
> once
> > > on
> > > >       the new node (“takeover”).
> > > >    - The current PR just introduces the fields needed for this
> > > (executor_id,
> > > >    heartbeat_at, http_status, finalized_at, response_*); the filter +
> > > >    Reconciler wiring comes in a follow‑up change.
> > > >
> > > > Cleanup of old idempotency records / collaboration across nodes
> > > >
> > > >    - On first acceptance we set expires_at = firstAcceptance +
> > > >    idempotency-key-lifetime + grace. This is a long‑lived retention
> > > >    horizon, separate from the short liveness lease:
> > > >       - lease_ttl (seconds) → “is the current owner still alive?”
> > > >       - expires_at (minutes/hours+) → “how long must we remember
> this key
> > > >       for duplicates/replay and when can we delete it?”
> > > >    - The SPI exposes purgeExpired(before) which is implemented as
> DELETE
> > > >    FROM idempotency_records WHERE expires_at < ?. Any node (or a
> > > >    maintenance job) can call this periodically. The delete is
> idempotent
> > > and
> > > >    keyed only by time, so multiple nodes can run cleanup concurrently
> > > without
> > > >    coordination; it’s intentionally a collaborative, best‑effort
> process.
> > > >
> > > > Thanks,
> > > > Huaxin
> > > >
> > > >
> > > > On Fri, Dec 5, 2025 at 4:56 PM Dmitri Bourlatchkov <[email protected]
> >
> > > wrote:
> > > >
> > > > > Hi Huaxin,
> > > > >
> > > > > I looked at the PR briefly. It appears to add a persistence SPI for
> > > storing
> > > > > information related to the processing of Idempotency Keys.
> > > > >
> > > > > I do not have any particular comment about that interface itself
> ATM,
> > > but I
> > > > > suppose it will be used (eventually) in conjunction with
> coordinating
> > > > > request processing in multiple Polaris server nodes. This aspect
> is not
> > > > > apparent in PR 3205 and IIRC it is not discussed in depth in the
> > > proposal
> > > > > doc (please correct me if I missed that). The updateHeartbeat()
> method
> > > is
> > > > > declared, but how it is supposed to be used is not clear to me.
> > > > >
> > > > > So, in order to better understand how the proposed persistence SPI
> can
> > > work
> > > > > end-to-end, could you expand on your proposal for coordinating
> retry
> > > > > processing across nodes and how that will relate to persistence?
> > > > >
> > > > > Specifically, I mean the case when a retry is submitted by a
> client to
> > > one
> > > > > node while another node is still processing the original request.
> > > Another
> > > > > case is when the original request node crashes without updating the
> > > > > persisted data.
> > > > >
> > > > > Additionally, what is your vision regarding the cleanup of old
> > > Idempotency
> > > > > records? Is this a collaborative process where all server nodes
> > > participate
> > > > > equally?
> > > > >
> > > > > Thanks,
> > > > > Dmitri.
> > > > >
> > > > > On Thu, Dec 4, 2025 at 4:53 PM huaxin gao <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Following the earlier idempotency proposal, I’ve opened a PR
> > > > > > <https://github.com/apache/polaris/pull/3205> that adds the
> first
> > > > > > implementation of idempotency retries to Polaris core and would
> like
> > > > > > feedback on the Java design.
> > > > > >
> > > > > > The PR introduces:
> > > > > >
> > > > > >    -
> > > > > >
> > > > > >    org.apache.polaris.core.persistence.IdempotencyStore – an SPI
> to
> > > > > >    reserve/load/finalize/purge idempotency records (with
> > > ReserveResult /
> > > > > >    ReserveResultType).
> > > > > >
> > > > > >
> > > > > >    -
> > > > > >
> > > > > >    org.apache.polaris.idempotency.IdempotencyRecord – the value
> type
> > > for
> > > > > >    stored idempotency metadata.
> > > > > >
> > > > > >
> > > > > >    -
> > > > > >
> > > > > >    PostgresIdempotencyStore + PostgresIdempotencyStoreIT – a
> > > > > >    relational‑jdbc implementation backed by
> > > > > >    POLARIS_SCHEMA.idempotency_records, tested via
> > > > > Testcontainers/Postgres.
> > > > > >
> > > > > > Please take a look at the PR and reply here or comment on the
> change
> > > with
> > > > > > any concerns or suggestions.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Huaxin
> > > > > >
> > > > >
> > >
>

Reply via email to