Hi,

ah, just saw this email thread, but I replied to the other discussion
thread earlier.
Can we consider discussing this topic in just one thread?

On Sun, Dec 7, 2025 at 2:19 AM huaxin gao <[email protected]> wrote:
>
> Hi Dmitri,
>
> Thanks a lot for the review and for calling this out!  You’re right that PR
> 3205 only shows the persistence SPI, so the cross‑node behavior isn’t
> obvious from the code alone.I’ve updated the design doc with a new section
> “Multi‑node Coordination, Liveness, and Cleanup” that spells this out, but
> let me summarize the answers to your questions here:
>
> Retry hits a different node while the original is still running
>
>    - Each request with an Idempotency-Key goes through an HTTP idempotency
>    filter (IdempotencyFilter) that calls
>
> reserve(realmId, key, operationType, normalizedResourceId, expiresAt,
> executorId, now).
>
>
>    - The Postgres implementation uses INSERT … ON CONFLICT DO NOTHING
> on (realm_id,
>    idempotency_key), so only one node wins globally (OWNED); all others see
>    DUPLICATE + the existing record.
>    - The winning node is the owner and periodically calls
>    updateHeartbeat(...) while the record is IN_PROGRESS, which keeps
>    heartbeat_at fresh.
>    - When a duplicate lands on another node, it looks at heartbeat_at and a
>    short lease (lease_ttl, e.g. ~25s): if now − heartbeat_at ≤ lease_ttl,
>    the original owner is treated as active. The duplicate does not re‑execute
>    the mutation; it waits (with a small bounded poll) for the record to become
>    FINALIZED, then replays the stored result.
>
> Original node crashes without finalizing the record
>
>    - In that case we still have an IN_PROGRESS row with an old heartbeat_at
>    and the original executor_id.
>    - After lease_ttl has passed, subsequent requests with the same key see
>    IN_PROGRESS and a stale heartbeat (now − heartbeat_at > lease_ttl), so
>    the filter treats the owner as dead and invokes a follow‑up Reconciler
>    SPI.
>    - The Reconciler does “finalize‑gap” reconciliation: it inspects the
>    catalog state for the (operationType, resourceId, realmId) to decide
>    whether the original mutation actually landed.
>       - If the state already reflects the intended change, it can mark the
>       record FINALIZED and allow duplicates to replay an equivalent success
>       response without re‑executing the commit.
>       - If the state does not reflect the change, it can re‑execute once on
>       the new node (“takeover”).
>    - The current PR just introduces the fields needed for this (executor_id,
>    heartbeat_at, http_status, finalized_at, response_*); the filter +
>    Reconciler wiring comes in a follow‑up change.
>
> Cleanup of old idempotency records / collaboration across nodes
>
>    - On first acceptance we set expires_at = firstAcceptance +
>    idempotency-key-lifetime + grace. This is a long‑lived retention
>    horizon, separate from the short liveness lease:
>       - lease_ttl (seconds) → “is the current owner still alive?”
>       - expires_at (minutes/hours+) → “how long must we remember this key
>       for duplicates/replay and when can we delete it?”
>    - The SPI exposes purgeExpired(before) which is implemented as DELETE
>    FROM idempotency_records WHERE expires_at < ?. Any node (or a
>    maintenance job) can call this periodically. The delete is idempotent and
>    keyed only by time, so multiple nodes can run cleanup concurrently without
>    coordination; it’s intentionally a collaborative, best‑effort process.
>
> Thanks,
> Huaxin
>
>
> On Fri, Dec 5, 2025 at 4:56 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> > Hi Huaxin,
> >
> > I looked at the PR briefly. It appears to add a persistence SPI for storing
> > information related to the processing of Idempotency Keys.
> >
> > I do not have any particular comment about that interface itself ATM, but I
> > suppose it will be used (eventually) in conjunction with coordinating
> > request processing in multiple Polaris server nodes. This aspect is not
> > apparent in PR 3205 and IIRC it is not discussed in depth in the proposal
> > doc (please correct me if I missed that). The updateHeartbeat() method is
> > declared, but how it is supposed to be used is not clear to me.
> >
> > So, in order to better understand how the proposed persistence SPI can work
> > end-to-end, could you expand on your proposal for coordinating retry
> > processing across nodes and how that will relate to persistence?
> >
> > Specifically, I mean the case when a retry is submitted by a client to one
> > node while another node is still processing the original request. Another
> > case is when the original request node crashes without updating the
> > persisted data.
> >
> > Additionally, what is your vision regarding the cleanup of old Idempotency
> > records? Is this a collaborative process where all server nodes participate
> > equally?
> >
> > Thanks,
> > Dmitri.
> >
> > On Thu, Dec 4, 2025 at 4:53 PM huaxin gao <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > Following the earlier idempotency proposal, I’ve opened a PR
> > > <https://github.com/apache/polaris/pull/3205> that adds the first
> > > implementation of idempotency retries to Polaris core and would like
> > > feedback on the Java design.
> > >
> > > The PR introduces:
> > >
> > >    -
> > >
> > >    org.apache.polaris.core.persistence.IdempotencyStore – an SPI to
> > >    reserve/load/finalize/purge idempotency records (with ReserveResult /
> > >    ReserveResultType).
> > >
> > >
> > >    -
> > >
> > >    org.apache.polaris.idempotency.IdempotencyRecord – the value type for
> > >    stored idempotency metadata.
> > >
> > >
> > >    -
> > >
> > >    PostgresIdempotencyStore + PostgresIdempotencyStoreIT – a
> > >    relational‑jdbc implementation backed by
> > >    POLARIS_SCHEMA.idempotency_records, tested via
> > Testcontainers/Postgres.
> > >
> > > Please take a look at the PR and reply here or comment on the change with
> > > any concerns or suggestions.
> > >
> > > Thanks,
> > >
> > > Huaxin
> > >
> >

Reply via email to