Re: [DISCUSS] Design concerns with Idempotency-Key implementation (PR #3803)

huaxin gao Fri, 08 May 2026 16:53:30 -0700

Thanks for the discussion.

I will open a separate refactor PR, following Dennis's sketch and applied
to the persistences already on main. Concretely, the PR will:


   1. Decouple BasePersistence — remove extends PolicyMappingPersistence
   and extends MetricsPersistence so all four (BasePersistence,
   PolicyMappingPersistence, MetricsPersistence, IntegrationPersistence) are
   disjoint top-level SPIs.
   2. Update TransactionalPersistence — add extends
   PolicyMappingPersistence and extends MetricsPersistence so it keeps
   aggregating the same set of behaviors after step 1.
   3. Update concrete impls — add PolicyMappingPersistence and
   MetricsPersistence to the implements list of JdbcBasePersistenceImpl and
   NonFunctionalBasePersistence / NoSqlMetaStore. Method bodies don't change,
   only the declared interfaces.
   4. Add factory accessors on MetaStoreManagerFactory —
   getOrCreateBasePersistence, getOrCreatePolicyMappingPersistence,
   getOrCreateMetricsPersistence, getOrCreateIntegrationPersistence. Rename
   the existing getOrCreateSession() accordingly.
   5. Mirror accessors on PolarisCallContext — getBasePersistence,
   getPolicyMappingPersistence, getMetricsPersistence,
   getIntegrationPersistence. Rename the existing getMetaStore() accordingly.
   6. Remove the IntegrationPersistence blind cast — replace
   ((IntegrationPersistence) ms) in AtomicOperationMetaStoreManager with
   callCtx.getIntegrationPersistence().

Once that lands, I'll rebase my idempotency PR on top so
IdempotencyPersistence follows the same pattern from the start (added as a
fifth disjoint top-level SPI, with one new accessor on each of
MetaStoreManagerFactory and PolarisCallContext, and JdbcBasePersistenceImpl
getting one additional implements entry — no BasePersistence extends
IdempotencyPersistence).

I'll start a draft of the refactor PR and loop your folks in once it's open.

Thanks,

Huaxin

On Wed, May 6, 2026 at 6:09 PM Dennis Huo <[email protected]> wrote:

> TL;DR: I agree with needing to fix the interface decomposition and that it
> shouldn't be too hard. I could go either way on whether it's done in the
> existing PR though -- we probably don't want to introduce yet another
> pattern if we might forget to refactor the existing MetricsPersistence and
> PolicyMappingPersistence to match the same pattern. Re: coupling, I'm not
> sure they're entirely "unrelated" but the coupling should be expressed
> differently than making BasePersistence a big interface-aggregator anyways,
> so I still agree with the refactor.
>
> Regarding the IdempotencyPersistence/BasePersistence hierarchy, though I
> don't feel too strongly either way in the short term for expediency (since
> internal layout consistency is generally good, and proper interface
> decomposition could be done across all the "modular" *Persistence
> interfaces at once in a consistent way), I do agree with Dmitri's point
> that right now making the "BasePersistence" interface be an aggregator
> interface that is directly used at callsites feels like we're sort of stuck
> in a one-foot-in-the-door state in terms of interface decomposition.
>
> It seems the splitting of factory and accessor methods could potentially be
> lightweight initially, since as I understand it, our first step would just
> be to push down the multiple-interface-inheritance into the concrete impls
> like "JdbcBasePersistenceImpl implements BasePersistence,
> MetricsPersistence, PolicyMappingPersistence, IdempotencyPersistence,
> IntegrationPersistence".
>
> At a glance, it looks like we'd need to split out some
> "getOrCreateMetricsPersistence", "getOrCreateIdempotencyPersistence", etc.,
> in MetaStoreManagerFactory and probably also mirror that set of accessors
> into PolarisCallContext alongside the current "BasePersistence
> getMetaStore()".
>
> It would probably also be a good time to rename the "getOrCreateSession"
> method to better capture what it's doing. And also rename
> "PolarisCallContext.getMetaStore()".
>
> I remember this decomposition was the plan back when I split
> BasePersistence vs IntegrationPersistence already -- ideally the
> introduction of PolicyMappingPersistence and MetricsPersistence should've
> followed the pattern of BasePersistence vs IntegrationPersistence already
> (though we didn't quite finish pulling IntegrationPersistence into the
> factory methods, instead just using a blind cast of
> ((IntegrationPersistence) ms) in the relevant callsites as a "temporary"
> placeholder that accidentally became permanent). Basically impls could
> always multi-inherit BasePersistence and FooPersistence, etc., but
> BasePersistence itself did *not* directly extend IntegrationPersistence.
>
> There are some caveats though:
>
> So, if we can agree
> > that IdempotencyPersistence is not conceptually related to Metastore
> > Persistence (my points above) then introducing tight coupling between
> > BasePersistence and IdempotencyPersistence is pure tech debt.
>
>
> I'm not sure I agree they're not conceptually related, though maybe I need
> to deep dive review the overall design. I agree that "coincidental" tight
> coupling doesn't help anything (so yes we should decompose the interfaces),
> but I think we need to define exactly what type of coupling we intend to
> support the correct semantics.
>
> The main thing: These idempotency keys are fundamentally to support
> ACID-related semantics, unlike best-effort caching types of use cases.
> Thus, the existence or nonexistence of idempotency records must be coupled
> in *some way* to their associated request "side effects", which in this
> case means core metastore persistence state.
>
> If we extend the original transactionality semantics, this is actually
> somewhat well defined:
>
> 1. TransactionalPersistence will continue to be an aggregator of
> BasePersistence and IdempotencyPersistence, with the implication that its
> own "runInTransaction" block covers both the BasePersistence and
> IdempotencyPersistence calls in a single atomic unit.
> 2. AtomicOperationMetaStoreManager intentionally does *not* hold a
> TransactionalPersistence, so we know that either we're sacrificing certain
> race-condition cases without transactions, *or* we define some higher-level
> "loose coupling" that enables a multi-phase commit semantic to ensure
> cohesive state between PolarisEntity and IdempotencyRecords.
>
> If we wanted to be more "formal" with that "loose coupling" it might mean
> adding a different interface like
> "NonTransactionalIdempotencyToEntityCoordinator" with methods like
> "PolarisEntity addIdempotencyBookkeepingToEntityBeforeChange(IdempotencyKey
> key, PolarisEntity entityBeingChanged)" and maybe different
> postEntityCommit or abortEntityCommitForIdempotencyKey methods related to
> the multi-phase commit handling.
>
>
> On Wed, May 6, 2026 at 2:27 PM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi All,
> >
> > As far as I can tell, what I proposed in my previous email on this thread
> > aligns with the SPI evolution direction in Metrics Persistence [1].
> >
> > [1] https://lists.apache.org/thread/7kp36t8fykg2hvzkr96r5m7cdnnyopj9
> >
> > Cheers,
> > Dmitri.
> >
> > On Wed, May 6, 2026 at 4:55 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> >
> > > Hi Huaxin,
> > >
> > > I agree with your analysis. However, regarding the design direction
> part,
> > > from my POV, it would be preferable to introduce a parallel factory
> > > dedicated to IdempotencyPersistence.
> > >
> > > Idempotency Persistence is not conceptually connected to Metastore
> > > Persistence.
> > >
> > > Therefore, a BasePersistence (Metastore) implementation using a
> > particular
> > > database technology does not have to implement IdempotencyPersistence
> > using
> > > the same technology.
> > >
> > > Moreover, it is conceivable that a Polaris admin may want to use
> > different
> > > database technologies for IdempotencyPersistence and Metastore
> > Persistence.
> > > I'd even suggest that this approach may be advisable because the
> > > performance and data consistency characteristics expected
> > > of IdempotencyPersistence and the Metastore are very different due to
> > their
> > > distinct use cases.
> > >
> > > When SPIs are separated, a particular implementation of Metastore
> > > Persistence can also implement IdempotencyPersistence and vice versa.
> > >
> > > Leaving `default` methods unimplemented (current state of [4269]) may
> > > permit some Metastore implemetation to ignore the
> IdempotencyPersistence
> > > SPI, but in that situation the Polaris admin user would have no option
> to
> > > support IdempotencyPersistence via a different technology.
> > >
> > > All in all, from my POV I see no disadvantage to isolating these SPIs,
> > > whereas I do see disadvantages in their coupling.
> > >
> > > This is also the direction I'd advocate in a more general Polaris SPI
> > > discussion (which has not fully started on `dev` yet).
> > >
> > > Re: PR [4269] specifically, I believe isolating IdempotencyPersistence
> > > from BasePersistence is not technically difficult. So, if we can agree
> > > that IdempotencyPersistence is not conceptually related to Metastore
> > > Persistence (my points above) then introducing tight coupling between
> > > BasePersistence and IdempotencyPersistence is pure tech debt. Instead
> of
> > > adding this coupling only to remove it in a follow-up PR, I believe it
> > > would require less work to restructure the code and avoid introducing
> the
> > > coupling upfront.
> > >
> > > [4269] https://github.com/apache/polaris/pull/4269
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Tue, May 5, 2026 at 7:45 PM huaxin gao <[email protected]>
> > wrote:
> > >
> > >> Hi Dimitri,
> > >>
> > >> Thanks for raising this. I'd prefer email so it's async and easier for
> > >> everyone to participate at their own pace. A dedicated sync can come
> > later
> > >> if email gets stuck.
> > >>
> > >> My understanding is that BasePersistence already extends
> > >> PolicyMappingPersistence and MetricsPersistence, plus has event
> > >> persistence
> > >> folded in directly. That's because AtomicOperationMetaStoreManager
> > >> operates
> > >> against a single BasePersistence instance per realm, and each backend
> > >> (in-memory, JDBC, NoSQL) has one factory chain producing it. So to
> > follow
> > >> the current architecture, we need to make BasePersistence extend
> > >> IdempotencyPersistence too. Splitting IdempotencyPersistence out as a
> > >> top-level SPI would mean adding parallel factory plumbing in every
> > >> backend.
> > >>
> > >> Happy to discuss the broader SPI shape on this thread. I'd prefer to
> > land
> > >> #4269 with the current shape since it follows the agreed-upon
> > >> architecture,
> > >> and revisit the layering uniformly in a follow-up once we converge —
> > that
> > >> broader discussion may take a while.
> > >>
> > >> Thanks,
> > >>
> > >> Huaxin
> > >>
> > >> On Tue, May 5, 2026 at 2:29 PM Dmitri Bourlatchkov <[email protected]>
> > >> wrote:
> > >>
> > >> > Hi Huaxin, Yufei,
> > >> >
> > >> > I left some comments on [4269] mostly about SPI design concerns.
> > >> >
> > >> > Since we wanted to reopen a more general SPI discussion anyway, it
> > >> might be
> > >> > worth having a dedicated sync about that...
> > >> >
> > >> > Alternatively, we could have that discussion by email now, if you
> > >> prefer. I
> > >> > have some starting points to share for discussion.
> > >> >
> > >> > WDYT?
> > >> >
> > >> > [4269] https://github.com/apache/polaris/pull/4269
> > >> >
> > >> > Thanks,
> > >> > Dmitri.
> > >> >
> > >> > On Fri, Apr 17, 2026 at 7:07 PM huaxin gao <[email protected]>
> > >> wrote:
> > >> >
> > >> > > Hi Robert,
> > >> > >
> > >> > > Thanks, and I’m glad we’re aligned on the handler-level
> direction. I
> > >> also
> > >> > > appreciate the detailed review and feedback throughout this
> > >> discussion.
> > >> > >
> > >> > > Since the old implementation is going away and the handler-level
> > >> > > implementation will be in a fresh PR, would you be OK removing the
> > -1
> > >> > from
> > >> > > PR #3803 <https://github.com/apache/polaris/pull/3803>? If so,
> I’ll
> > >> > close
> > >> > > the old PR and open a new one for the handler-level approach.
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > Huaxin
> > >> > >
> > >> > > On Thu, Apr 16, 2026 at 12:57 AM Robert Stupp <[email protected]>
> > wrote:
> > >> > >
> > >> > > > i Yufei, Huaxin,
> > >> > > >
> > >> > > > Thanks for the thoughtful replies, and for opening the issue for
> > >> > > > `createTableStaged` [1]. I’m glad we’re converging on the
> > >> handler-level
> > >> > > > direction.
> > >> > > >
> > >> > > > I agree benchmarking can run in parallel with implementation.
> For
> > my
> > >> > own
> > >> > > > review, I’d prefer to wait until reproducible benchmark results
> > are
> > >> > > posted
> > >> > > > (using the scope from my April 14 note, or with any scope
> changes
> > >> > called
> > >> > > > out).
> > >> > > >
> > >> > > > Happy to review once that is available.
> > >> > > >
> > >> > > > Best,
> > >> > > > Robert
> > >> > > >
> > >> > > > [1] https://github.com/apache/polaris/issues/4200
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Apr 14, 2026 at 7:54 PM huaxin gao <
> > [email protected]>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi Robert,
> > >> > > > >
> > >> > > > > Thanks for the feedback. Glad the handler-level design
> addresses
> > >> the
> > >> > > > > concerns.
> > >> > > > >
> > >> > > > > I also feel we should converge on the design and
> implementation
> > >> > first,
> > >> > > > and
> > >> > > > > run benchmarks in parallel. The access patterns will be
> > different
> > >> > with
> > >> > > > the
> > >> > > > > handler-level approach, so benchmarking the current
> filter-based
> > >> code
> > >> > > > > wouldn't be representative.
> > >> > > > >
> > >> > > > > I'll open a GitHub issue to track createTableStaged support,
> and
> > >> I'll
> > >> > > > open
> > >> > > > > a fresh PR for the handler-level implementation.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > >
> > >> > > > > Huaxin
> > >> > > > >
> > >> > > > > On Tue, Apr 14, 2026 at 10:30 AM Yufei Gu <
> [email protected]
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > Hi Robert,
> > >> > > > > >
> > >> > > > > > Thanks for the thorough review and for clearly laying out
> the
> > >> > > concerns,
> > >> > > > > > this is very helpful.
> > >> > > > > >
> > >> > > > > > For point 1(Q3), I do not think this needs to be a hard
> > >> > prerequisite
> > >> > > > > before
> > >> > > > > > looking at the implementation.
> > >> > > > > >
> > >> > > > > > The handler level design and correctness concerns are
> largely
> > >> > > > independent
> > >> > > > > > from the exact performance characteristics. It would be more
> > >> > > productive
> > >> > > > > to
> > >> > > > > > first converge on the right architecture, then validate and
> > >> iterate
> > >> > > on
> > >> > > > > > performance with concrete measurements. Also, while the
> store
> > is
> > >> > > > reused,
> > >> > > > > > the access pattern and critical sections change with the
> > handler
> > >> > > level
> > >> > > > > > approach. Measuring the current implementation in isolation
> > may
> > >> not
> > >> > > > > reflect
> > >> > > > > > the actual behavior of the final design.
> > >> > > > > >
> > >> > > > > > We already have a similar situation with the NoSQL
> benchmark,
> > >> where
> > >> > > the
> > >> > > > > > code has been merged for a while but the benchmarking work
> is
> > >> still
> > >> > > in
> > >> > > > > > progress. That has not blocked design or code review, and we
> > are
> > >> > > > > iterating
> > >> > > > > > on performance as we go.
> > >> > > > > >
> > >> > > > > > I agree that realistic benchmarking is important. But I
> would
> > >> > suggest
> > >> > > > we
> > >> > > > > > treat this as a parallel track, not a gate. Otherwise we
> risk
> > >> > > blocking
> > >> > > > > > progress on the design without knowing whether the measured
> > >> > overhead
> > >> > > is
> > >> > > > > > even representative of the final system.
> > >> > > > > >
> > >> > > > > > Happy to help define a benchmark plan once the design is
> > >> settled.
> > >> > > > > >
> > >> > > > > > Yufei
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, Apr 14, 2026 at 6:28 AM Robert Stupp <
> [email protected]>
> > >> > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Huaxin,
> > >> > > > > > >
> > >> > > > > > > yeah, the handler-level approach is the right call, I'm
> glad
> > >> you
> > >> > > went
> > >> > > > > > with
> > >> > > > > > > it.
> > >> > > > > > >
> > >> > > > > > > To be explicit about what your proposal aims to resolve:
> > >> > > > > > >
> > >> > > > > > > - Authorization bypass is fixed. The check now runs after
> > >> > > > > resource-level
> > >> > > > > > > authorization.
> > >> > > > > > > - No credentials in the database. Fresh credentials
> > re-vended
> > >> on
> > >> > > > replay
> > >> > > > > > via
> > >> > > > > > > the normal vending path.
> > >> > > > > > > - Phase 2 architecture problem gone. The handler has the
> > >> > principal,
> > >> > > > > > storage
> > >> > > > > > > config, and IAM access that the filter never could have.
> > >> > > > > > > - Caller identity is in the binding.
> > >> > > > > > >
> > >> > > > > > > Three things still need adressing:
> > >> > > > > > >
> > >> > > > > > > First, and this is a hard prerequisite before I look at
> any
> > >> > > > > > implementation
> > >> > > > > > > code: Q3 from April 5 is still open. The store is being
> > reused
> > >> > > as-is,
> > >> > > > > so
> > >> > > > > > > the overhead can be measured right now without new handler
> > >> code.
> > >> > 3+
> > >> > > > DB
> > >> > > > > > > round-trips per mutation on the catalog connection pool,
> > >> > write-hot
> > >> > > > > table,
> > >> > > > > > > polling loops under duplicate traffic - those numbers need
> > to
> > >> > come
> > >> > > > > from a
> > >> > > > > > > realistic setup: networked database under concurrent load,
> > >> not a
> > >> > > > local
> > >> > > > > > > Postgres. The latency concerns are specifically about
> Aurora
> > >> and
> > >> > > > > > > CockroachDB, and the polling storm only shows up under
> > >> concurrent
> > >> > > > > > duplicate
> > >> > > > > > > traffic. Please share the methodology so the results are
> > >> > > > reproducible.
> > >> > > > > > >
> > >> > > > > > > Second: please open a GitHub issue for the
> createTableStaged
> > >> > > > deferral.
> > >> > > > > > >
> > >> > > > > > > Third: since the filter is going away entirely and the
> > >> critical
> > >> > > > > sections
> > >> > > > > > > move to the handlers, does it make sense to close #3803
> and
> > >> open
> > >> > a
> > >> > > > > fresh
> > >> > > > > > > PR? The old diff is mostly code that's being deleted, and
> it
> > >> > > carries
> > >> > > > a
> > >> > > > > > lot
> > >> > > > > > > of review history. A clean PR would be easier for everyone
> > to
> > >> > > review
> > >> > > > -
> > >> > > > > > just
> > >> > > > > > > reference the old one for context.
> > >> > > > > > >
> > >> > > > > > > Happy to do a fresh review once the new implementation is
> > up.
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > > Robert
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Tue, Apr 14, 2026 at 1:44 AM huaxin gao <
> > >> > [email protected]
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi Robert,
> > >> > > > > > > >
> > >> > > > > > > > Here is a revised design proposal for the idempotency
> > replay
> > >> > > > > mechanism.
> > >> > > > > > > I'd
> > >> > > > > > > > appreciate your feedback. Thanks!
> > >> > > > > > > >
> > >> > > > > > > > *Moving idempotency to handler level*
> > >> > > > > > > >
> > >> > > > > > > > The idempotency check moves from the HTTP filter into
> the
> > >> > handler
> > >> > > > > > > methods,
> > >> > > > > > > > after authorization. The filter is removed. The caller's
> > >> > > principal
> > >> > > > > hash
> > >> > > > > > > is
> > >> > > > > > > > included in the idempotency binding and checked on
> replay.
> > >> > > > > > > >
> > >> > > > > > > > *Credential-bearing mutations (createTable)*
> > >> > > > > > > >
> > >> > > > > > > > On replay, instead of returning a stored response, the
> > >> handler
> > >> > > > calls
> > >> > > > > > > > buildLoadTableResponseWithDelegationCredentials with the
> > >> > existing
> > >> > > > > > table's
> > >> > > > > > > > metadata. This vends fresh credentials for the current
> > >> caller
> > >> > > > through
> > >> > > > > > the
> > >> > > > > > > > normal credential vending path. No credentials are
> stored
> > in
> > >> > the
> > >> > > > > > > database —
> > >> > > > > > > > the idempotency record only stores the key, principal
> > hash,
> > >> > table
> > >> > > > > > > > identifier, and status code.
> > >> > > > > > > >
> > >> > > > > > > > *Non-credential mutations (dropTable, renameTable,
> > >> > namespace/view
> > >> > > > > > > > operations)*
> > >> > > > > > > >
> > >> > > > > > > > These do not vend credentials, so there is no leakage
> > >> concern.
> > >> > On
> > >> > > > > > replay,
> > >> > > > > > > > the handler returns the stored status code and minimal
> > >> response
> > >> > > > body.
> > >> > > > > > > >
> > >> > > > > > > > *createTableStaged*
> > >> > > > > > > >
> > >> > > > > > > > Staged tables are not persisted to the catalog, so the
> > >> method
> > >> > can
> > >> > > > > > safely
> > >> > > > > > > > re-run on retry. Idempotency support for this endpoint
> can
> > >> be
> > >> > > > > deferred.
> > >> > > > > > > >
> > >> > > > > > > > *What is reused from the current implementation*
> > >> > > > > > > >
> > >> > > > > > > > The IdempotencyStore interface, database schema,
> > >> configuration,
> > >> > > and
> > >> > > > > > > > background purge logic are reused. The main change is
> > moving
> > >> > the
> > >> > > > > > > > check/finalize logic from the filter into the handler.
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > >
> > >> > > > > > > > Huaxin
> > >> > > > > > > >
> > >> > > > > > > > On Sun, Apr 5, 2026 at 11:53 AM huaxin gao <
> > >> > > [email protected]
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hi Robert,
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks for the follow-up.
> > >> > > > > > > > >
> > >> > > > > > > > > I appreciate the clarification. I read the listed
> items
> > as
> > >> > fix
> > >> > > > > > options
> > >> > > > > > > > and
> > >> > > > > > > > > picked option 2, and the Phase 1/Phase 2 split was
> > purely
> > >> to
> > >> > > keep
> > >> > > > > the
> > >> > > > > > > PR
> > >> > > > > > > > at
> > >> > > > > > > > > a reviewable size, not a design choice.
> > >> > > > > > > > >
> > >> > > > > > > > > I agree that the caller's identity should be part of
> the
> > >> > > binding.
> > >> > > > > > That
> > >> > > > > > > is
> > >> > > > > > > > > straightforward to add.
> > >> > > > > > > > >
> > >> > > > > > > > > On the architecture question, whether re-vending can
> > work
> > >> > > within
> > >> > > > > the
> > >> > > > > > > > > filter or needs a handler-level approach, I want to
> give
> > >> this
> > >> > > > > proper
> > >> > > > > > > > > thought rather than rush an answer. I may not be able
> to
> > >> get
> > >> > > back
> > >> > > > > to
> > >> > > > > > > you
> > >> > > > > > > > on
> > >> > > > > > > > > this until next week (the week of 4/13).
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > >
> > >> > > > > > > > > Huaxin
> > >> > > > > > > > >
> > >> > > > > > > > > On Sat, Apr 4, 2026 at 11:42 PM Robert Stupp <
> > >> [email protected]
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > >> Hi Huaxin,
> > >> > > > > > > > >>
> > >> > > > > > > > >> Thanks for responding. A few corrections and
> > >> clarifications.
> > >> > > > > > > > >>
> > >> > > > > > > > >> My PR review described three consequences of the
> > missing
> > >> > > caller
> > >> > > > > > > > >> identity binding and the credential leakage attack
> > >> vector.
> > >> > > Those
> > >> > > > > > > > >> were not suggestions for a Phase 1/Phase 2 split. The
> > >> > offline
> > >> > > > > > > > >> decision to split into phases, and what each phase
> > >> contains,
> > >> > > was
> > >> > > > > > > > >> yours [1].
> > >> > > > > > > > >>
> > >> > > > > > > > >> On the phasing: the concern is not PR size. The
> concern
> > >> is
> > >> > > that
> > >> > > > > > > > >> Phase 1 alone ships a full authorization bypass. The
> > >> filter
> > >> > > > > replays
> > >> > > > > > > > >> cached responses before Polaris's resource-level
> > >> > authorization
> > >> > > > > > > > >> runs. Stripping credentials does not fix that. Any
> > >> > > authenticated
> > >> > > > > > > > >> user with a valid idempotency key can receive a
> cached
> > >> > > response
> > >> > > > > > > > >> for a resource they are not authorized to access.
> That
> > is
> > >> > the
> > >> > > > case
> > >> > > > > > > > >> with or without credentials in the response.
> > >> > > > > > > > >>
> > >> > > > > > > > >> "We will not release it until entire idempotency
> coding
> > >> is
> > >> > > > > > > > >> finished" is reassuring, but the architecture
> question
> > >> > > remains:
> > >> > > > > > > > >> can the generic filter approach support re-vending at
> > >> all?
> > >> > My
> > >> > > > > > > > >> email explained why I believe it cannot, since it
> would
> > >> > > require
> > >> > > > > > > > >> the filter to authorize the caller, resolve storage
> > >> > locations,
> > >> > > > > > > > >> call cloud IAM, and manage credential caching. That
> is
> > >> the
> > >> > > > catalog
> > >> > > > > > > > >> handler's job, not a filter's.
> > >> > > > > > > > >>
> > >> > > > > > > > >> The questions from my original email that still need
> > >> > answers:
> > >> > > > > > > > >>
> > >> > > > > > > > >> How does the binding prevent cross-principal cache
> hits
> > >> > > without
> > >> > > > > > > > >> including the caller's identity?
> > >> > > > > > > > >>
> > >> > > > > > > > >> How does Phase 2 re-vend credentials from a generic
> > HTTP
> > >> > > filter
> > >> > > > > > > > >> without reimplementing the catalog handler's
> > >> authorization
> > >> > and
> > >> > > > > > > > >> credential vending logic?
> > >> > > > > > > > >>
> > >> > > > > > > > >> Has the operational overhead (3 DB round-trips per
> > >> mutation
> > >> > on
> > >> > > > > > > > >> the catalog's connection pool) been benchmarked on
> any
> > >> > > backend?
> > >> > > > > > > > >>
> > >> > > > > > > > >> I'd like to resolve these on this thread before more
> > >> code is
> > >> > > > > > > > >> written.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Best,
> > >> > > > > > > > >> Robert
> > >> > > > > > > > >>
> > >> > > > > > > > >> [1]
> > >> > > > > > >
> > >> > >
> https://github.com/apache/polaris/pull/3803#issuecomment-4180969838
> > >> > > > > > > > >>
> > >> > > > > > > > >> On Sun, Apr 5, 2026 at 7:55 AM huaxin gao <
> > >> > > > [email protected]
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > > >>
> > >> > > > > > > > >> > Hi Robert,
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Thanks for raising this on the mailing list. I
> > >> acknowledge
> > >> > > the
> > >> > > > > > > design
> > >> > > > > > > > >> > concerns you listed.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > To clarify: I picked your option 2 (strip
> credentials
> > >> and
> > >> > > > > re-vend
> > >> > > > > > on
> > >> > > > > > > > >> > replay) you suggested in the PR review. The offline
> > >> > > discussion
> > >> > > > > > with
> > >> > > > > > > > >> Yufei
> > >> > > > > > > > >> > and Prashant was simply to verify that I picked the
> > >> best
> > >> > > > option
> > >> > > > > > from
> > >> > > > > > > > all
> > >> > > > > > > > >> > the three suggestions you listed in the review.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > The reason I split it into two phases is that the
> > >> current
> > >> > PR
> > >> > > > is
> > >> > > > > > > > already
> > >> > > > > > > > >> > around 3000 lines. Adding re-vending logic on top
> > would
> > >> > make
> > >> > > > it
> > >> > > > > > very
> > >> > > > > > > > >> hard
> > >> > > > > > > > >> > to review. This is a big feature and we need to do
> it
> > >> > piece
> > >> > > by
> > >> > > > > > > piece.
> > >> > > > > > > > We
> > >> > > > > > > > >> > will not release it until entire idempotency coding
> > is
> > >> > > > finished.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > I'm happy to discuss the architecture further on
> this
> > >> > > thread,
> > >> > > > > > > whether
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > filter approach can support re-vending, or whether
> we
> > >> > need a
> > >> > > > > > > > >> handler-level
> > >> > > > > > > > >> > design. I may be slow to respond next week but I'll
> > >> follow
> > >> > > up
> > >> > > > > > after
> > >> > > > > > > > >> that.
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Thanks,
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Huaxin
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > On Sat, Apr 4, 2026 at 9:34 AM Robert Stupp <
> > >> > [email protected]
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > > Hi all,
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > I need to raise some fundamental design concerns
> > with
> > >> > the
> > >> > > > > > current
> > >> > > > > > > > >> > > Idempotency-Key implementation in PR #3803 [1].
> > Some
> > >> of
> > >> > > > these
> > >> > > > > > were
> > >> > > > > > > > >> > > raised in November 2025 [2] and were acknowledged
> > [3]
> > >> > but
> > >> > > > > never
> > >> > > > > > > > >> > > addressed in the implementation.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > I raised the caller identity concern on the
> > >> > > > > > > > >> > > mailing list in November 2025 [2]. In December
> 2025
> > >> > [4], I
> > >> > > > > > > > >> > > reiterated that all existing implementations I
> > could
> > >> > find
> > >> > > do
> > >> > > > > > > > >> > > request fingerprinting including the request
> body,
> > >> and
> > >> > > > warned
> > >> > > > > > that
> > >> > > > > > > > >> > > the idempotency subsystem itself must not become
> a
> > >> > source
> > >> > > of
> > >> > > > > > > > >> > > failures. The PR was opened in February 2026
> > without
> > >> > > > > addressing
> > >> > > > > > > > >> > > either point.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The credential leakage consequences surfaced
> during
> > >> code
> > >> > > > > review
> > >> > > > > > > > >> > > [5]. The response was an offline decision between
> > >> three
> > >> > > > > > > > contributors,
> > >> > > > > > > > >> > > without mailing list discussion [6]. The decision
> > >> was to
> > >> > > > split
> > >> > > > > > > > >> > > into Phase 1 (strip credentials, which produces
> > >> broken
> > >> > > > > responses
> > >> > > > > > > > >> > > but does not fix the authorization bypass) and
> > Phase
> > >> 2
> > >> > > > > (re-vend
> > >> > > > > > > > >> > > on replay, which would require a complete
> redesign
> > >> and
> > >> > > > > > > > >> > > reimplementation). Details below.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > I think we need to resolve these at the design
> > level
> > >> > > before
> > >> > > > > more
> > >> > > > > > > > >> > > code is written.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > I will not do further code reviews on this PR
> until
> > >> > these
> > >> > > > > design
> > >> > > > > > > > >> > > concerns are resolved.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > 1. Caller identity is not part of the idempotency
> > >> > binding
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The caller's identity is not part of the binding.
> > The
> > >> > > > > > > > >> > > filter replays cached responses before Polaris's
> > >> > > > > resource-level
> > >> > > > > > > > >> > > authorization runs. Any authenticated user with a
> > >> valid
> > >> > > > > > > idempotency
> > >> > > > > > > > >> > > key can receive a cached response for a resource
> > they
> > >> > are
> > >> > > > not
> > >> > > > > > > > >> > > authorized to access, including vended storage
> > >> > > credentials.
> > >> > > > > This
> > >> > > > > > > > >> > > is a full authorization bypass.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Idempotency keys are HTTP headers that end up in
> > >> access
> > >> > > > logs,
> > >> > > > > > > > >> > > observability tooling, debug output. They cannot
> > be a
> > >> > > > security
> > >> > > > > > > > >> > > boundary. I described the attack vector in my PR
> > >> review
> > >> > > [5].
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The proposed Phase 2 [6] says it will
> "re-authorize
> > >> the
> > >> > > user
> > >> > > > > and
> > >> > > > > > > > >> > > re-vend fresh credentials on replay." That
> > addresses
> > >> > > > > credentials
> > >> > > > > > > > >> > > but not the binding itself.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > 2. Credential stripping breaks Polaris's
> deployment
> > >> > model
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Without mailing list discussion, the decision was
> > >> made
> > >> > > > offline
> > >> > > > > > > > >> > > to strip credentials before storing ("Phase 1")
> > [6].
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Polaris's purpose is to vend down-scoped
> > credentials.
> > >> > > > Clients
> > >> > > > > > have
> > >> > > > > > > > >> > > no ambient storage access.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > A stripped response returns a successful response
> > >> > ("table
> > >> > > > > > > created")
> > >> > > > > > > > >> > > to the client, which then fails hard because it
> > >> cannot
> > >> > > > access
> > >> > > > > > the
> > >> > > > > > > > >> > > object storage. Query engines (Spark, Trino,
> etc.)
> > >> will
> > >> > > fail
> > >> > > > > on
> > >> > > > > > > > >> > > the first storage operation against that table.
> > Query
> > >> > > engine
> > >> > > > > > users
> > >> > > > > > > > >> > > have no way to resolve this, they don't control
> the
> > >> REST
> > >> > > > > catalog
> > >> > > > > > > > >> > > interaction. For stage-create, the client gets a
> > >> > > successful
> > >> > > > > > > > >> > > response telling it to write data but has no
> > >> credentials
> > >> > > to
> > >> > > > > > write
> > >> > > > > > > > >> > > with. No fallback exists.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The stripped response is spec-compliant, but
> > >> > > spec-compliant
> > >> > > > > and
> > >> > > > > > > > >> > > operationally correct are not the same thing.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > 3. Phase 2 is architecturally incompatible with
> > >> Phase 1
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The filter is a generic HTTP filter below the
> > >> > application
> > >> > > > > layer.
> > >> > > > > > > It
> > >> > > > > > > > >> > > treats responses as opaque JSON. Credential
> vending
> > >> > > happens
> > >> > > > > deep
> > >> > > > > > > > >> > > inside IcebergCatalogHandler, scoped to specific
> > >> table
> > >> > > > > locations
> > >> > > > > > > > >> > > and principals, through cloud IAM calls (STS
> > >> AssumeRole,
> > >> > > GCS
> > >> > > > > > token
> > >> > > > > > > > >> > > exchange).
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Re-vending on replay would require the filter to
> > >> resolve
> > >> > > the
> > >> > > > > > > > >> > > table's storage location, authorize the caller,
> > call
> > >> > cloud
> > >> > > > IAM
> > >> > > > > > to
> > >> > > > > > > > >> > > mint scoped credentials, and manage credential
> > >> caching.
> > >> > > That
> > >> > > > > is
> > >> > > > > > > > >> > > not a filter, that is reimplementing the catalog
> > >> > handler's
> > >> > > > > > > > >> > > authorization and credential vending logic.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Phase 1 and Phase 2 need to be designed and
> shipped
> > >> > > > together.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > 4. Operational overhead
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Every idempotent mutation adds 3 database
> > round-trips
> > >> > > > (INSERT,
> > >> > > > > > > > UPDATE,
> > >> > > > > > > > >> > > heartbeat UPDATEs), each borrowing its own
> > connection
> > >> > > > > > > > >> > > (autoCommit=true, no reuse), sharing the
> catalog's
> > >> > > > connection
> > >> > > > > > pool
> > >> > > > > > > > >> > > and RDBMS. This added latency increases the
> > >> probability
> > >> > of
> > >> > > > > > client
> > >> > > > > > > > >> > > timeouts, which are exactly the transient
> failures
> > >> that
> > >> > > > > > > idempotency
> > >> > > > > > > > >> > > is supposed to prevent.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The idempotency table is high-churn and
> write-hot,
> > >> > > competing
> > >> > > > > > with
> > >> > > > > > > > >> > > catalog operations for connection pool capacity.
> > The
> > >> > > > overhead
> > >> > > > > > gets
> > >> > > > > > > > >> > > worse on managed and distributed database
> backends.
> > >> > > > Duplicate
> > >> > > > > > > > >> > > requests trigger polling loops with repeated
> > database
> > >> > > reads.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > No benchmark data has been presented for any
> > backend.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > What needs to change
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > The current generic-filter approach cannot
> support
> > >> these
> > >> > > > > > > > >> > > requirements. This needs a different design.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > A new design must address the following:
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > a) The idempotency binding MUST include the
> > caller's
> > >> > > > principal
> > >> > > > > > > > >> > >    identity.
> > >> > > > > > > > >> > > b) Credential-bearing responses must not be
> stored
> > >> > as-is.
> > >> > > On
> > >> > > > > > > > >> > >    replay, the handler must build a fresh
> response
> > >> with
> > >> > > > fresh
> > >> > > > > > > > >> > >    credentials for the current caller.
> > >> > > > > > > > >> > > c) The operational overhead needs to be
> > benchmarked.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > I am happy to review code once the design
> addresses
> > >> > these
> > >> > > > > > points.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Best,
> > >> > > > > > > > >> > > Robert
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > [1] https://github.com/apache/polaris/pull/3803
> > >> > > > > > > > >> > > [2]
> > >> > > > > > > >
> > >> > https://lists.apache.org/thread/qqrgr2bzsp69629jdj8kf39m10pzwy6l
> > >> > > > > > > > >> > >     (Nov 24, 2025: "Building a service-internal
> key
> > >> from
> > >> > > the
> > >> > > > > > > client
> > >> > > > > > > > >> > >     provided idempotency key plus more data from
> > the
> > >> > > request
> > >> > > > > > gives
> > >> > > > > > > > >> > >     better uniqueness. That more data can come
> > from:
> > >> > > > operation
> > >> > > > > > id,
> > >> > > > > > > > >> > >     all operation parameters, including the whole
> > >> > payload,
> > >> > > > > > quite a
> > >> > > > > > > > >> > >     few HTTP request headers like user-agent,
> > >> > > authorization,
> > >> > > > > > host,
> > >> > > > > > > > >> > >     accept-* and also the client's address.")
> > >> > > > > > > > >> > > [3]
> > >> > > > > > > >
> > >> > https://lists.apache.org/thread/7vy44mxsq2cgtp8gsj0r8pk8blp37h5b
> > >> > > > > > > > >> > >     (Dec 8, 2025: "I agree that a bare
> > >> Idempotency-Key +
> > >> > > > > entity
> > >> > > > > > > > >> > >     identifier is not enough to protect against
> > >> buggy or
> > >> > > > > > malicious
> > >> > > > > > > > >> > >     clients; fingerprinting the full request
> would
> > be
> > >> > > > > > stronger.")
> > >> > > > > > > > >> > > [4]
> > >> > > > > > > >
> > >> > https://lists.apache.org/thread/jt3gmpnjh6z490dvxorjs6ms00kvo264
> > >> > > > > > > > >> > >     (Dec 8, 2025: "We should avoid failing
> Polaris
> > if
> > >> > this
> > >> > > > > > > subsystem
> > >> > > > > > > > >> > >     fails, or letting this subsystem be a reason
> > for
> > >> its
> > >> > > > > > existence
> > >> > > > > > > > >> > >     (aka retry due to timeouts because the
> > >> > > > idempotent-request
> > >> > > > > > > > >> > >     subsystem hangs).")
> > >> > > > > > > > >> > > [5]
> > >> > > > > > > > >> > >
> > >> > > > > > > > >>
> > >> > > > > > > >
> > >> > > > > >
> > >> > > >
> > >> >
> > >>
> > https://github.com/apache/polaris/pull/3803#pullrequestreview-4051690055
> > >> > > > > > > > >> > > [6]
> > >> > > > > > > > >>
> > >> > > > >
> > >> https://github.com/apache/polaris/pull/3803#issuecomment-4180969838
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> >
> > >> > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] Design concerns with Idempotency-Key implementation (PR #3803)

Reply via email to