Re: [DISCUSS] Design concerns with Idempotency-Key implementation (PR #3803)

Dmitri Bourlatchkov Tue, 05 May 2026 14:29:51 -0700

Hi Huaxin, Yufei,

I left some comments on [4269] mostly about SPI design concerns.


Since we wanted to reopen a more general SPI discussion anyway, it might be
worth having a dedicated sync about that...

Alternatively, we could have that discussion by email now, if you prefer. I
have some starting points to share for discussion.

WDYT?

[4269] https://github.com/apache/polaris/pull/4269

Thanks,
Dmitri.

On Fri, Apr 17, 2026 at 7:07 PM huaxin gao <[email protected]> wrote:

> Hi Robert,
>
> Thanks, and I’m glad we’re aligned on the handler-level direction. I also
> appreciate the detailed review and feedback throughout this discussion.
>
> Since the old implementation is going away and the handler-level
> implementation will be in a fresh PR, would you be OK removing the -1 from
> PR #3803 <https://github.com/apache/polaris/pull/3803>? If so, I’ll close
> the old PR and open a new one for the handler-level approach.
>
> Best,
>
> Huaxin
>
> On Thu, Apr 16, 2026 at 12:57 AM Robert Stupp <[email protected]> wrote:
>
> > i Yufei, Huaxin,
> >
> > Thanks for the thoughtful replies, and for opening the issue for
> > `createTableStaged` [1]. I’m glad we’re converging on the handler-level
> > direction.
> >
> > I agree benchmarking can run in parallel with implementation. For my own
> > review, I’d prefer to wait until reproducible benchmark results are
> posted
> > (using the scope from my April 14 note, or with any scope changes called
> > out).
> >
> > Happy to review once that is available.
> >
> > Best,
> > Robert
> >
> > [1] https://github.com/apache/polaris/issues/4200
> >
> >
> > On Tue, Apr 14, 2026 at 7:54 PM huaxin gao <[email protected]>
> wrote:
> >
> > > Hi Robert,
> > >
> > > Thanks for the feedback. Glad the handler-level design addresses the
> > > concerns.
> > >
> > > I also feel we should converge on the design and implementation first,
> > and
> > > run benchmarks in parallel. The access patterns will be different with
> > the
> > > handler-level approach, so benchmarking the current filter-based code
> > > wouldn't be representative.
> > >
> > > I'll open a GitHub issue to track createTableStaged support, and I'll
> > open
> > > a fresh PR for the handler-level implementation.
> > >
> > > Best,
> > >
> > > Huaxin
> > >
> > > On Tue, Apr 14, 2026 at 10:30 AM Yufei Gu <[email protected]>
> wrote:
> > >
> > > > Hi Robert,
> > > >
> > > > Thanks for the thorough review and for clearly laying out the
> concerns,
> > > > this is very helpful.
> > > >
> > > > For point 1(Q3), I do not think this needs to be a hard prerequisite
> > > before
> > > > looking at the implementation.
> > > >
> > > > The handler level design and correctness concerns are largely
> > independent
> > > > from the exact performance characteristics. It would be more
> productive
> > > to
> > > > first converge on the right architecture, then validate and iterate
> on
> > > > performance with concrete measurements. Also, while the store is
> > reused,
> > > > the access pattern and critical sections change with the handler
> level
> > > > approach. Measuring the current implementation in isolation may not
> > > reflect
> > > > the actual behavior of the final design.
> > > >
> > > > We already have a similar situation with the NoSQL benchmark, where
> the
> > > > code has been merged for a while but the benchmarking work is still
> in
> > > > progress. That has not blocked design or code review, and we are
> > > iterating
> > > > on performance as we go.
> > > >
> > > > I agree that realistic benchmarking is important. But I would suggest
> > we
> > > > treat this as a parallel track, not a gate. Otherwise we risk
> blocking
> > > > progress on the design without knowing whether the measured overhead
> is
> > > > even representative of the final system.
> > > >
> > > > Happy to help define a benchmark plan once the design is settled.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Tue, Apr 14, 2026 at 6:28 AM Robert Stupp <[email protected]> wrote:
> > > >
> > > > > Hi Huaxin,
> > > > >
> > > > > yeah, the handler-level approach is the right call, I'm glad you
> went
> > > > with
> > > > > it.
> > > > >
> > > > > To be explicit about what your proposal aims to resolve:
> > > > >
> > > > > - Authorization bypass is fixed. The check now runs after
> > > resource-level
> > > > > authorization.
> > > > > - No credentials in the database. Fresh credentials re-vended on
> > replay
> > > > via
> > > > > the normal vending path.
> > > > > - Phase 2 architecture problem gone. The handler has the principal,
> > > > storage
> > > > > config, and IAM access that the filter never could have.
> > > > > - Caller identity is in the binding.
> > > > >
> > > > > Three things still need adressing:
> > > > >
> > > > > First, and this is a hard prerequisite before I look at any
> > > > implementation
> > > > > code: Q3 from April 5 is still open. The store is being reused
> as-is,
> > > so
> > > > > the overhead can be measured right now without new handler code. 3+
> > DB
> > > > > round-trips per mutation on the catalog connection pool, write-hot
> > > table,
> > > > > polling loops under duplicate traffic - those numbers need to come
> > > from a
> > > > > realistic setup: networked database under concurrent load, not a
> > local
> > > > > Postgres. The latency concerns are specifically about Aurora and
> > > > > CockroachDB, and the polling storm only shows up under concurrent
> > > > duplicate
> > > > > traffic. Please share the methodology so the results are
> > reproducible.
> > > > >
> > > > > Second: please open a GitHub issue for the createTableStaged
> > deferral.
> > > > >
> > > > > Third: since the filter is going away entirely and the critical
> > > sections
> > > > > move to the handlers, does it make sense to close #3803 and open a
> > > fresh
> > > > > PR? The old diff is mostly code that's being deleted, and it
> carries
> > a
> > > > lot
> > > > > of review history. A clean PR would be easier for everyone to
> review
> > -
> > > > just
> > > > > reference the old one for context.
> > > > >
> > > > > Happy to do a fresh review once the new implementation is up.
> > > > >
> > > > > Best,
> > > > > Robert
> > > > >
> > > > >
> > > > > On Tue, Apr 14, 2026 at 1:44 AM huaxin gao <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Hi Robert,
> > > > > >
> > > > > > Here is a revised design proposal for the idempotency replay
> > > mechanism.
> > > > > I'd
> > > > > > appreciate your feedback. Thanks!
> > > > > >
> > > > > > *Moving idempotency to handler level*
> > > > > >
> > > > > > The idempotency check moves from the HTTP filter into the handler
> > > > > methods,
> > > > > > after authorization. The filter is removed. The caller's
> principal
> > > hash
> > > > > is
> > > > > > included in the idempotency binding and checked on replay.
> > > > > >
> > > > > > *Credential-bearing mutations (createTable)*
> > > > > >
> > > > > > On replay, instead of returning a stored response, the handler
> > calls
> > > > > > buildLoadTableResponseWithDelegationCredentials with the existing
> > > > table's
> > > > > > metadata. This vends fresh credentials for the current caller
> > through
> > > > the
> > > > > > normal credential vending path. No credentials are stored in the
> > > > > database —
> > > > > > the idempotency record only stores the key, principal hash, table
> > > > > > identifier, and status code.
> > > > > >
> > > > > > *Non-credential mutations (dropTable, renameTable, namespace/view
> > > > > > operations)*
> > > > > >
> > > > > > These do not vend credentials, so there is no leakage concern. On
> > > > replay,
> > > > > > the handler returns the stored status code and minimal response
> > body.
> > > > > >
> > > > > > *createTableStaged*
> > > > > >
> > > > > > Staged tables are not persisted to the catalog, so the method can
> > > > safely
> > > > > > re-run on retry. Idempotency support for this endpoint can be
> > > deferred.
> > > > > >
> > > > > > *What is reused from the current implementation*
> > > > > >
> > > > > > The IdempotencyStore interface, database schema, configuration,
> and
> > > > > > background purge logic are reused. The main change is moving the
> > > > > > check/finalize logic from the filter into the handler.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Huaxin
> > > > > >
> > > > > > On Sun, Apr 5, 2026 at 11:53 AM huaxin gao <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Robert,
> > > > > > >
> > > > > > > Thanks for the follow-up.
> > > > > > >
> > > > > > > I appreciate the clarification. I read the listed items as fix
> > > > options
> > > > > > and
> > > > > > > picked option 2, and the Phase 1/Phase 2 split was purely to
> keep
> > > the
> > > > > PR
> > > > > > at
> > > > > > > a reviewable size, not a design choice.
> > > > > > >
> > > > > > > I agree that the caller's identity should be part of the
> binding.
> > > > That
> > > > > is
> > > > > > > straightforward to add.
> > > > > > >
> > > > > > > On the architecture question, whether re-vending can work
> within
> > > the
> > > > > > > filter or needs a handler-level approach, I want to give this
> > > proper
> > > > > > > thought rather than rush an answer. I may not be able to get
> back
> > > to
> > > > > you
> > > > > > on
> > > > > > > this until next week (the week of 4/13).
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Huaxin
> > > > > > >
> > > > > > > On Sat, Apr 4, 2026 at 11:42 PM Robert Stupp <[email protected]>
> > > wrote:
> > > > > > >
> > > > > > >> Hi Huaxin,
> > > > > > >>
> > > > > > >> Thanks for responding. A few corrections and clarifications.
> > > > > > >>
> > > > > > >> My PR review described three consequences of the missing
> caller
> > > > > > >> identity binding and the credential leakage attack vector.
> Those
> > > > > > >> were not suggestions for a Phase 1/Phase 2 split. The offline
> > > > > > >> decision to split into phases, and what each phase contains,
> was
> > > > > > >> yours [1].
> > > > > > >>
> > > > > > >> On the phasing: the concern is not PR size. The concern is
> that
> > > > > > >> Phase 1 alone ships a full authorization bypass. The filter
> > > replays
> > > > > > >> cached responses before Polaris's resource-level authorization
> > > > > > >> runs. Stripping credentials does not fix that. Any
> authenticated
> > > > > > >> user with a valid idempotency key can receive a cached
> response
> > > > > > >> for a resource they are not authorized to access. That is the
> > case
> > > > > > >> with or without credentials in the response.
> > > > > > >>
> > > > > > >> "We will not release it until entire idempotency coding is
> > > > > > >> finished" is reassuring, but the architecture question
> remains:
> > > > > > >> can the generic filter approach support re-vending at all? My
> > > > > > >> email explained why I believe it cannot, since it would
> require
> > > > > > >> the filter to authorize the caller, resolve storage locations,
> > > > > > >> call cloud IAM, and manage credential caching. That is the
> > catalog
> > > > > > >> handler's job, not a filter's.
> > > > > > >>
> > > > > > >> The questions from my original email that still need answers:
> > > > > > >>
> > > > > > >> How does the binding prevent cross-principal cache hits
> without
> > > > > > >> including the caller's identity?
> > > > > > >>
> > > > > > >> How does Phase 2 re-vend credentials from a generic HTTP
> filter
> > > > > > >> without reimplementing the catalog handler's authorization and
> > > > > > >> credential vending logic?
> > > > > > >>
> > > > > > >> Has the operational overhead (3 DB round-trips per mutation on
> > > > > > >> the catalog's connection pool) been benchmarked on any
> backend?
> > > > > > >>
> > > > > > >> I'd like to resolve these on this thread before more code is
> > > > > > >> written.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Robert
> > > > > > >>
> > > > > > >> [1]
> > > > >
> https://github.com/apache/polaris/pull/3803#issuecomment-4180969838
> > > > > > >>
> > > > > > >> On Sun, Apr 5, 2026 at 7:55 AM huaxin gao <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Hi Robert,
> > > > > > >> >
> > > > > > >> > Thanks for raising this on the mailing list. I acknowledge
> the
> > > > > design
> > > > > > >> > concerns you listed.
> > > > > > >> >
> > > > > > >> > To clarify: I picked your option 2 (strip credentials and
> > > re-vend
> > > > on
> > > > > > >> > replay) you suggested in the PR review. The offline
> discussion
> > > > with
> > > > > > >> Yufei
> > > > > > >> > and Prashant was simply to verify that I picked the best
> > option
> > > > from
> > > > > > all
> > > > > > >> > the three suggestions you listed in the review.
> > > > > > >> >
> > > > > > >> > The reason I split it into two phases is that the current PR
> > is
> > > > > > already
> > > > > > >> > around 3000 lines. Adding re-vending logic on top would make
> > it
> > > > very
> > > > > > >> hard
> > > > > > >> > to review. This is a big feature and we need to do it piece
> by
> > > > > piece.
> > > > > > We
> > > > > > >> > will not release it until entire idempotency coding is
> > finished.
> > > > > > >> >
> > > > > > >> > I'm happy to discuss the architecture further on this
> thread,
> > > > > whether
> > > > > > >> the
> > > > > > >> > filter approach can support re-vending, or whether we need a
> > > > > > >> handler-level
> > > > > > >> > design. I may be slow to respond next week but I'll follow
> up
> > > > after
> > > > > > >> that.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Huaxin
> > > > > > >> >
> > > > > > >> > On Sat, Apr 4, 2026 at 9:34 AM Robert Stupp <[email protected]
> >
> > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hi all,
> > > > > > >> > >
> > > > > > >> > > I need to raise some fundamental design concerns with the
> > > > current
> > > > > > >> > > Idempotency-Key implementation in PR #3803 [1]. Some of
> > these
> > > > were
> > > > > > >> > > raised in November 2025 [2] and were acknowledged [3] but
> > > never
> > > > > > >> > > addressed in the implementation.
> > > > > > >> > >
> > > > > > >> > > I raised the caller identity concern on the
> > > > > > >> > > mailing list in November 2025 [2]. In December 2025 [4], I
> > > > > > >> > > reiterated that all existing implementations I could find
> do
> > > > > > >> > > request fingerprinting including the request body, and
> > warned
> > > > that
> > > > > > >> > > the idempotency subsystem itself must not become a source
> of
> > > > > > >> > > failures. The PR was opened in February 2026 without
> > > addressing
> > > > > > >> > > either point.
> > > > > > >> > >
> > > > > > >> > > The credential leakage consequences surfaced during code
> > > review
> > > > > > >> > > [5]. The response was an offline decision between three
> > > > > > contributors,
> > > > > > >> > > without mailing list discussion [6]. The decision was to
> > split
> > > > > > >> > > into Phase 1 (strip credentials, which produces broken
> > > responses
> > > > > > >> > > but does not fix the authorization bypass) and Phase 2
> > > (re-vend
> > > > > > >> > > on replay, which would require a complete redesign and
> > > > > > >> > > reimplementation). Details below.
> > > > > > >> > >
> > > > > > >> > > I think we need to resolve these at the design level
> before
> > > more
> > > > > > >> > > code is written.
> > > > > > >> > >
> > > > > > >> > > I will not do further code reviews on this PR until these
> > > design
> > > > > > >> > > concerns are resolved.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > 1. Caller identity is not part of the idempotency binding
> > > > > > >> > >
> > > > > > >> > > The caller's identity is not part of the binding. The
> > > > > > >> > > filter replays cached responses before Polaris's
> > > resource-level
> > > > > > >> > > authorization runs. Any authenticated user with a valid
> > > > > idempotency
> > > > > > >> > > key can receive a cached response for a resource they are
> > not
> > > > > > >> > > authorized to access, including vended storage
> credentials.
> > > This
> > > > > > >> > > is a full authorization bypass.
> > > > > > >> > >
> > > > > > >> > > Idempotency keys are HTTP headers that end up in access
> > logs,
> > > > > > >> > > observability tooling, debug output. They cannot be a
> > security
> > > > > > >> > > boundary. I described the attack vector in my PR review
> [5].
> > > > > > >> > >
> > > > > > >> > > The proposed Phase 2 [6] says it will "re-authorize the
> user
> > > and
> > > > > > >> > > re-vend fresh credentials on replay." That addresses
> > > credentials
> > > > > > >> > > but not the binding itself.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > 2. Credential stripping breaks Polaris's deployment model
> > > > > > >> > >
> > > > > > >> > > Without mailing list discussion, the decision was made
> > offline
> > > > > > >> > > to strip credentials before storing ("Phase 1") [6].
> > > > > > >> > >
> > > > > > >> > > Polaris's purpose is to vend down-scoped credentials.
> > Clients
> > > > have
> > > > > > >> > > no ambient storage access.
> > > > > > >> > >
> > > > > > >> > > A stripped response returns a successful response ("table
> > > > > created")
> > > > > > >> > > to the client, which then fails hard because it cannot
> > access
> > > > the
> > > > > > >> > > object storage. Query engines (Spark, Trino, etc.) will
> fail
> > > on
> > > > > > >> > > the first storage operation against that table. Query
> engine
> > > > users
> > > > > > >> > > have no way to resolve this, they don't control the REST
> > > catalog
> > > > > > >> > > interaction. For stage-create, the client gets a
> successful
> > > > > > >> > > response telling it to write data but has no credentials
> to
> > > > write
> > > > > > >> > > with. No fallback exists.
> > > > > > >> > >
> > > > > > >> > > The stripped response is spec-compliant, but
> spec-compliant
> > > and
> > > > > > >> > > operationally correct are not the same thing.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > 3. Phase 2 is architecturally incompatible with Phase 1
> > > > > > >> > >
> > > > > > >> > > The filter is a generic HTTP filter below the application
> > > layer.
> > > > > It
> > > > > > >> > > treats responses as opaque JSON. Credential vending
> happens
> > > deep
> > > > > > >> > > inside IcebergCatalogHandler, scoped to specific table
> > > locations
> > > > > > >> > > and principals, through cloud IAM calls (STS AssumeRole,
> GCS
> > > > token
> > > > > > >> > > exchange).
> > > > > > >> > >
> > > > > > >> > > Re-vending on replay would require the filter to resolve
> the
> > > > > > >> > > table's storage location, authorize the caller, call cloud
> > IAM
> > > > to
> > > > > > >> > > mint scoped credentials, and manage credential caching.
> That
> > > is
> > > > > > >> > > not a filter, that is reimplementing the catalog handler's
> > > > > > >> > > authorization and credential vending logic.
> > > > > > >> > >
> > > > > > >> > > Phase 1 and Phase 2 need to be designed and shipped
> > together.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > 4. Operational overhead
> > > > > > >> > >
> > > > > > >> > > Every idempotent mutation adds 3 database round-trips
> > (INSERT,
> > > > > > UPDATE,
> > > > > > >> > > heartbeat UPDATEs), each borrowing its own connection
> > > > > > >> > > (autoCommit=true, no reuse), sharing the catalog's
> > connection
> > > > pool
> > > > > > >> > > and RDBMS. This added latency increases the probability of
> > > > client
> > > > > > >> > > timeouts, which are exactly the transient failures that
> > > > > idempotency
> > > > > > >> > > is supposed to prevent.
> > > > > > >> > >
> > > > > > >> > > The idempotency table is high-churn and write-hot,
> competing
> > > > with
> > > > > > >> > > catalog operations for connection pool capacity. The
> > overhead
> > > > gets
> > > > > > >> > > worse on managed and distributed database backends.
> > Duplicate
> > > > > > >> > > requests trigger polling loops with repeated database
> reads.
> > > > > > >> > >
> > > > > > >> > > No benchmark data has been presented for any backend.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > What needs to change
> > > > > > >> > >
> > > > > > >> > > The current generic-filter approach cannot support these
> > > > > > >> > > requirements. This needs a different design.
> > > > > > >> > >
> > > > > > >> > > A new design must address the following:
> > > > > > >> > >
> > > > > > >> > > a) The idempotency binding MUST include the caller's
> > principal
> > > > > > >> > >    identity.
> > > > > > >> > > b) Credential-bearing responses must not be stored as-is.
> On
> > > > > > >> > >    replay, the handler must build a fresh response with
> > fresh
> > > > > > >> > >    credentials for the current caller.
> > > > > > >> > > c) The operational overhead needs to be benchmarked.
> > > > > > >> > >
> > > > > > >> > > I am happy to review code once the design addresses these
> > > > points.
> > > > > > >> > >
> > > > > > >> > > Best,
> > > > > > >> > > Robert
> > > > > > >> > >
> > > > > > >> > > [1] https://github.com/apache/polaris/pull/3803
> > > > > > >> > > [2]
> > > > > > https://lists.apache.org/thread/qqrgr2bzsp69629jdj8kf39m10pzwy6l
> > > > > > >> > >     (Nov 24, 2025: "Building a service-internal key from
> the
> > > > > client
> > > > > > >> > >     provided idempotency key plus more data from the
> request
> > > > gives
> > > > > > >> > >     better uniqueness. That more data can come from:
> > operation
> > > > id,
> > > > > > >> > >     all operation parameters, including the whole payload,
> > > > quite a
> > > > > > >> > >     few HTTP request headers like user-agent,
> authorization,
> > > > host,
> > > > > > >> > >     accept-* and also the client's address.")
> > > > > > >> > > [3]
> > > > > > https://lists.apache.org/thread/7vy44mxsq2cgtp8gsj0r8pk8blp37h5b
> > > > > > >> > >     (Dec 8, 2025: "I agree that a bare Idempotency-Key +
> > > entity
> > > > > > >> > >     identifier is not enough to protect against buggy or
> > > > malicious
> > > > > > >> > >     clients; fingerprinting the full request would be
> > > > stronger.")
> > > > > > >> > > [4]
> > > > > > https://lists.apache.org/thread/jt3gmpnjh6z490dvxorjs6ms00kvo264
> > > > > > >> > >     (Dec 8, 2025: "We should avoid failing Polaris if this
> > > > > subsystem
> > > > > > >> > >     fails, or letting this subsystem be a reason for its
> > > > existence
> > > > > > >> > >     (aka retry due to timeouts because the
> > idempotent-request
> > > > > > >> > >     subsystem hangs).")
> > > > > > >> > > [5]
> > > > > > >> > >
> > > > > > >>
> > > > > >
> > > >
> > https://github.com/apache/polaris/pull/3803#pullrequestreview-4051690055
> > > > > > >> > > [6]
> > > > > > >>
> > > https://github.com/apache/polaris/pull/3803#issuecomment-4180969838
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Design concerns with Idempotency-Key implementation (PR #3803)

Reply via email to