Cross-mail-race-conditions happen ;)

On Mon, Dec 8, 2025 at 7:53 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> Sorry, Robert, it might be my fault for causing this new thread. My
> intention was to have a more focused discussion about java code changes
> here.
>
> Cheers,
> Dmitri.
>
> On Mon, Dec 8, 2025 at 5:57 AM Robert Stupp <[email protected]> wrote:
>
> > Hi,
> >
> > ah, just saw this email thread, but I replied to the other discussion
> > thread earlier.
> > Can we consider discussing this topic in just one thread?
> >
> > On Sun, Dec 7, 2025 at 2:19 AM huaxin gao <[email protected]> wrote:
> > >
> > > Hi Dmitri,
> > >
> > > Thanks a lot for the review and for calling this out!  You’re right that
> > PR
> > > 3205 only shows the persistence SPI, so the cross‑node behavior isn’t
> > > obvious from the code alone.I’ve updated the design doc with a new
> > section
> > > “Multi‑node Coordination, Liveness, and Cleanup” that spells this out,
> > but
> > > let me summarize the answers to your questions here:
> > >
> > > Retry hits a different node while the original is still running
> > >
> > >    - Each request with an Idempotency-Key goes through an HTTP
> > idempotency
> > >    filter (IdempotencyFilter) that calls
> > >
> > > reserve(realmId, key, operationType, normalizedResourceId, expiresAt,
> > > executorId, now).
> > >
> > >
> > >    - The Postgres implementation uses INSERT … ON CONFLICT DO NOTHING
> > > on (realm_id,
> > >    idempotency_key), so only one node wins globally (OWNED); all others
> > see
> > >    DUPLICATE + the existing record.
> > >    - The winning node is the owner and periodically calls
> > >    updateHeartbeat(...) while the record is IN_PROGRESS, which keeps
> > >    heartbeat_at fresh.
> > >    - When a duplicate lands on another node, it looks at heartbeat_at
> > and a
> > >    short lease (lease_ttl, e.g. ~25s): if now − heartbeat_at ≤ lease_ttl,
> > >    the original owner is treated as active. The duplicate does not
> > re‑execute
> > >    the mutation; it waits (with a small bounded poll) for the record to
> > become
> > >    FINALIZED, then replays the stored result.
> > >
> > > Original node crashes without finalizing the record
> > >
> > >    - In that case we still have an IN_PROGRESS row with an old
> > heartbeat_at
> > >    and the original executor_id.
> > >    - After lease_ttl has passed, subsequent requests with the same key
> > see
> > >    IN_PROGRESS and a stale heartbeat (now − heartbeat_at > lease_ttl), so
> > >    the filter treats the owner as dead and invokes a follow‑up Reconciler
> > >    SPI.
> > >    - The Reconciler does “finalize‑gap” reconciliation: it inspects the
> > >    catalog state for the (operationType, resourceId, realmId) to decide
> > >    whether the original mutation actually landed.
> > >       - If the state already reflects the intended change, it can mark
> > the
> > >       record FINALIZED and allow duplicates to replay an equivalent
> > success
> > >       response without re‑executing the commit.
> > >       - If the state does not reflect the change, it can re‑execute once
> > on
> > >       the new node (“takeover”).
> > >    - The current PR just introduces the fields needed for this
> > (executor_id,
> > >    heartbeat_at, http_status, finalized_at, response_*); the filter +
> > >    Reconciler wiring comes in a follow‑up change.
> > >
> > > Cleanup of old idempotency records / collaboration across nodes
> > >
> > >    - On first acceptance we set expires_at = firstAcceptance +
> > >    idempotency-key-lifetime + grace. This is a long‑lived retention
> > >    horizon, separate from the short liveness lease:
> > >       - lease_ttl (seconds) → “is the current owner still alive?”
> > >       - expires_at (minutes/hours+) → “how long must we remember this key
> > >       for duplicates/replay and when can we delete it?”
> > >    - The SPI exposes purgeExpired(before) which is implemented as DELETE
> > >    FROM idempotency_records WHERE expires_at < ?. Any node (or a
> > >    maintenance job) can call this periodically. The delete is idempotent
> > and
> > >    keyed only by time, so multiple nodes can run cleanup concurrently
> > without
> > >    coordination; it’s intentionally a collaborative, best‑effort process.
> > >
> > > Thanks,
> > > Huaxin
> > >
> > >
> > > On Fri, Dec 5, 2025 at 4:56 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> > >
> > > > Hi Huaxin,
> > > >
> > > > I looked at the PR briefly. It appears to add a persistence SPI for
> > storing
> > > > information related to the processing of Idempotency Keys.
> > > >
> > > > I do not have any particular comment about that interface itself ATM,
> > but I
> > > > suppose it will be used (eventually) in conjunction with coordinating
> > > > request processing in multiple Polaris server nodes. This aspect is not
> > > > apparent in PR 3205 and IIRC it is not discussed in depth in the
> > proposal
> > > > doc (please correct me if I missed that). The updateHeartbeat() method
> > is
> > > > declared, but how it is supposed to be used is not clear to me.
> > > >
> > > > So, in order to better understand how the proposed persistence SPI can
> > work
> > > > end-to-end, could you expand on your proposal for coordinating retry
> > > > processing across nodes and how that will relate to persistence?
> > > >
> > > > Specifically, I mean the case when a retry is submitted by a client to
> > one
> > > > node while another node is still processing the original request.
> > Another
> > > > case is when the original request node crashes without updating the
> > > > persisted data.
> > > >
> > > > Additionally, what is your vision regarding the cleanup of old
> > Idempotency
> > > > records? Is this a collaborative process where all server nodes
> > participate
> > > > equally?
> > > >
> > > > Thanks,
> > > > Dmitri.
> > > >
> > > > On Thu, Dec 4, 2025 at 4:53 PM huaxin gao <[email protected]>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Following the earlier idempotency proposal, I’ve opened a PR
> > > > > <https://github.com/apache/polaris/pull/3205> that adds the first
> > > > > implementation of idempotency retries to Polaris core and would like
> > > > > feedback on the Java design.
> > > > >
> > > > > The PR introduces:
> > > > >
> > > > >    -
> > > > >
> > > > >    org.apache.polaris.core.persistence.IdempotencyStore – an SPI to
> > > > >    reserve/load/finalize/purge idempotency records (with
> > ReserveResult /
> > > > >    ReserveResultType).
> > > > >
> > > > >
> > > > >    -
> > > > >
> > > > >    org.apache.polaris.idempotency.IdempotencyRecord – the value type
> > for
> > > > >    stored idempotency metadata.
> > > > >
> > > > >
> > > > >    -
> > > > >
> > > > >    PostgresIdempotencyStore + PostgresIdempotencyStoreIT – a
> > > > >    relational‑jdbc implementation backed by
> > > > >    POLARIS_SCHEMA.idempotency_records, tested via
> > > > Testcontainers/Postgres.
> > > > >
> > > > > Please take a look at the PR and reply here or comment on the change
> > with
> > > > > any concerns or suggestions.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Huaxin
> > > > >
> > > >
> >

Reply via email to