Re: [DISCUSS] Design concerns with Idempotency-Key implementation (PR #3803)

Dmitri Bourlatchkov Wed, 06 May 2026 14:28:23 -0700

Hi All,

As far as I can tell, what I proposed in my previous email on this thread
aligns with the SPI evolution direction in Metrics Persistence [1].


[1] https://lists.apache.org/thread/7kp36t8fykg2hvzkr96r5m7cdnnyopj9

Cheers,
Dmitri.

On Wed, May 6, 2026 at 4:55 PM Dmitri Bourlatchkov <[email protected]> wrote:

> Hi Huaxin,
>
> I agree with your analysis. However, regarding the design direction part,
> from my POV, it would be preferable to introduce a parallel factory
> dedicated to IdempotencyPersistence.
>
> Idempotency Persistence is not conceptually connected to Metastore
> Persistence.
>
> Therefore, a BasePersistence (Metastore) implementation using a particular
> database technology does not have to implement IdempotencyPersistence using
> the same technology.
>
> Moreover, it is conceivable that a Polaris admin may want to use different
> database technologies for IdempotencyPersistence and Metastore Persistence.
> I'd even suggest that this approach may be advisable because the
> performance and data consistency characteristics expected
> of IdempotencyPersistence and the Metastore are very different due to their
> distinct use cases.
>
> When SPIs are separated, a particular implementation of Metastore
> Persistence can also implement IdempotencyPersistence and vice versa.
>
> Leaving `default` methods unimplemented (current state of [4269]) may
> permit some Metastore implemetation to ignore the IdempotencyPersistence
> SPI, but in that situation the Polaris admin user would have no option to
> support IdempotencyPersistence via a different technology.
>
> All in all, from my POV I see no disadvantage to isolating these SPIs,
> whereas I do see disadvantages in their coupling.
>
> This is also the direction I'd advocate in a more general Polaris SPI
> discussion (which has not fully started on `dev` yet).
>
> Re: PR [4269] specifically, I believe isolating IdempotencyPersistence
> from BasePersistence is not technically difficult. So, if we can agree
> that IdempotencyPersistence is not conceptually related to Metastore
> Persistence (my points above) then introducing tight coupling between
> BasePersistence and IdempotencyPersistence is pure tech debt. Instead of
> adding this coupling only to remove it in a follow-up PR, I believe it
> would require less work to restructure the code and avoid introducing the
> coupling upfront.
>
> [4269] https://github.com/apache/polaris/pull/4269
>
> Cheers,
> Dmitri.
>
> On Tue, May 5, 2026 at 7:45 PM huaxin gao <[email protected]> wrote:
>
>> Hi Dimitri,
>>
>> Thanks for raising this. I'd prefer email so it's async and easier for
>> everyone to participate at their own pace. A dedicated sync can come later
>> if email gets stuck.
>>
>> My understanding is that BasePersistence already extends
>> PolicyMappingPersistence and MetricsPersistence, plus has event
>> persistence
>> folded in directly. That's because AtomicOperationMetaStoreManager
>> operates
>> against a single BasePersistence instance per realm, and each backend
>> (in-memory, JDBC, NoSQL) has one factory chain producing it. So to follow
>> the current architecture, we need to make BasePersistence extend
>> IdempotencyPersistence too. Splitting IdempotencyPersistence out as a
>> top-level SPI would mean adding parallel factory plumbing in every
>> backend.
>>
>> Happy to discuss the broader SPI shape on this thread. I'd prefer to land
>> #4269 with the current shape since it follows the agreed-upon
>> architecture,
>> and revisit the layering uniformly in a follow-up once we converge — that
>> broader discussion may take a while.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Tue, May 5, 2026 at 2:29 PM Dmitri Bourlatchkov <[email protected]>
>> wrote:
>>
>> > Hi Huaxin, Yufei,
>> >
>> > I left some comments on [4269] mostly about SPI design concerns.
>> >
>> > Since we wanted to reopen a more general SPI discussion anyway, it
>> might be
>> > worth having a dedicated sync about that...
>> >
>> > Alternatively, we could have that discussion by email now, if you
>> prefer. I
>> > have some starting points to share for discussion.
>> >
>> > WDYT?
>> >
>> > [4269] https://github.com/apache/polaris/pull/4269
>> >
>> > Thanks,
>> > Dmitri.
>> >
>> > On Fri, Apr 17, 2026 at 7:07 PM huaxin gao <[email protected]>
>> wrote:
>> >
>> > > Hi Robert,
>> > >
>> > > Thanks, and I’m glad we’re aligned on the handler-level direction. I
>> also
>> > > appreciate the detailed review and feedback throughout this
>> discussion.
>> > >
>> > > Since the old implementation is going away and the handler-level
>> > > implementation will be in a fresh PR, would you be OK removing the -1
>> > from
>> > > PR #3803 <https://github.com/apache/polaris/pull/3803>? If so, I’ll
>> > close
>> > > the old PR and open a new one for the handler-level approach.
>> > >
>> > > Best,
>> > >
>> > > Huaxin
>> > >
>> > > On Thu, Apr 16, 2026 at 12:57 AM Robert Stupp <[email protected]> wrote:
>> > >
>> > > > i Yufei, Huaxin,
>> > > >
>> > > > Thanks for the thoughtful replies, and for opening the issue for
>> > > > `createTableStaged` [1]. I’m glad we’re converging on the
>> handler-level
>> > > > direction.
>> > > >
>> > > > I agree benchmarking can run in parallel with implementation. For my
>> > own
>> > > > review, I’d prefer to wait until reproducible benchmark results are
>> > > posted
>> > > > (using the scope from my April 14 note, or with any scope changes
>> > called
>> > > > out).
>> > > >
>> > > > Happy to review once that is available.
>> > > >
>> > > > Best,
>> > > > Robert
>> > > >
>> > > > [1] https://github.com/apache/polaris/issues/4200
>> > > >
>> > > >
>> > > > On Tue, Apr 14, 2026 at 7:54 PM huaxin gao <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Hi Robert,
>> > > > >
>> > > > > Thanks for the feedback. Glad the handler-level design addresses
>> the
>> > > > > concerns.
>> > > > >
>> > > > > I also feel we should converge on the design and implementation
>> > first,
>> > > > and
>> > > > > run benchmarks in parallel. The access patterns will be different
>> > with
>> > > > the
>> > > > > handler-level approach, so benchmarking the current filter-based
>> code
>> > > > > wouldn't be representative.
>> > > > >
>> > > > > I'll open a GitHub issue to track createTableStaged support, and
>> I'll
>> > > > open
>> > > > > a fresh PR for the handler-level implementation.
>> > > > >
>> > > > > Best,
>> > > > >
>> > > > > Huaxin
>> > > > >
>> > > > > On Tue, Apr 14, 2026 at 10:30 AM Yufei Gu <[email protected]>
>> > > wrote:
>> > > > >
>> > > > > > Hi Robert,
>> > > > > >
>> > > > > > Thanks for the thorough review and for clearly laying out the
>> > > concerns,
>> > > > > > this is very helpful.
>> > > > > >
>> > > > > > For point 1(Q3), I do not think this needs to be a hard
>> > prerequisite
>> > > > > before
>> > > > > > looking at the implementation.
>> > > > > >
>> > > > > > The handler level design and correctness concerns are largely
>> > > > independent
>> > > > > > from the exact performance characteristics. It would be more
>> > > productive
>> > > > > to
>> > > > > > first converge on the right architecture, then validate and
>> iterate
>> > > on
>> > > > > > performance with concrete measurements. Also, while the store is
>> > > > reused,
>> > > > > > the access pattern and critical sections change with the handler
>> > > level
>> > > > > > approach. Measuring the current implementation in isolation may
>> not
>> > > > > reflect
>> > > > > > the actual behavior of the final design.
>> > > > > >
>> > > > > > We already have a similar situation with the NoSQL benchmark,
>> where
>> > > the
>> > > > > > code has been merged for a while but the benchmarking work is
>> still
>> > > in
>> > > > > > progress. That has not blocked design or code review, and we are
>> > > > > iterating
>> > > > > > on performance as we go.
>> > > > > >
>> > > > > > I agree that realistic benchmarking is important. But I would
>> > suggest
>> > > > we
>> > > > > > treat this as a parallel track, not a gate. Otherwise we risk
>> > > blocking
>> > > > > > progress on the design without knowing whether the measured
>> > overhead
>> > > is
>> > > > > > even representative of the final system.
>> > > > > >
>> > > > > > Happy to help define a benchmark plan once the design is
>> settled.
>> > > > > >
>> > > > > > Yufei
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Apr 14, 2026 at 6:28 AM Robert Stupp <[email protected]>
>> > wrote:
>> > > > > >
>> > > > > > > Hi Huaxin,
>> > > > > > >
>> > > > > > > yeah, the handler-level approach is the right call, I'm glad
>> you
>> > > went
>> > > > > > with
>> > > > > > > it.
>> > > > > > >
>> > > > > > > To be explicit about what your proposal aims to resolve:
>> > > > > > >
>> > > > > > > - Authorization bypass is fixed. The check now runs after
>> > > > > resource-level
>> > > > > > > authorization.
>> > > > > > > - No credentials in the database. Fresh credentials re-vended
>> on
>> > > > replay
>> > > > > > via
>> > > > > > > the normal vending path.
>> > > > > > > - Phase 2 architecture problem gone. The handler has the
>> > principal,
>> > > > > > storage
>> > > > > > > config, and IAM access that the filter never could have.
>> > > > > > > - Caller identity is in the binding.
>> > > > > > >
>> > > > > > > Three things still need adressing:
>> > > > > > >
>> > > > > > > First, and this is a hard prerequisite before I look at any
>> > > > > > implementation
>> > > > > > > code: Q3 from April 5 is still open. The store is being reused
>> > > as-is,
>> > > > > so
>> > > > > > > the overhead can be measured right now without new handler
>> code.
>> > 3+
>> > > > DB
>> > > > > > > round-trips per mutation on the catalog connection pool,
>> > write-hot
>> > > > > table,
>> > > > > > > polling loops under duplicate traffic - those numbers need to
>> > come
>> > > > > from a
>> > > > > > > realistic setup: networked database under concurrent load,
>> not a
>> > > > local
>> > > > > > > Postgres. The latency concerns are specifically about Aurora
>> and
>> > > > > > > CockroachDB, and the polling storm only shows up under
>> concurrent
>> > > > > > duplicate
>> > > > > > > traffic. Please share the methodology so the results are
>> > > > reproducible.
>> > > > > > >
>> > > > > > > Second: please open a GitHub issue for the createTableStaged
>> > > > deferral.
>> > > > > > >
>> > > > > > > Third: since the filter is going away entirely and the
>> critical
>> > > > > sections
>> > > > > > > move to the handlers, does it make sense to close #3803 and
>> open
>> > a
>> > > > > fresh
>> > > > > > > PR? The old diff is mostly code that's being deleted, and it
>> > > carries
>> > > > a
>> > > > > > lot
>> > > > > > > of review history. A clean PR would be easier for everyone to
>> > > review
>> > > > -
>> > > > > > just
>> > > > > > > reference the old one for context.
>> > > > > > >
>> > > > > > > Happy to do a fresh review once the new implementation is up.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Robert
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, Apr 14, 2026 at 1:44 AM huaxin gao <
>> > [email protected]
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Robert,
>> > > > > > > >
>> > > > > > > > Here is a revised design proposal for the idempotency replay
>> > > > > mechanism.
>> > > > > > > I'd
>> > > > > > > > appreciate your feedback. Thanks!
>> > > > > > > >
>> > > > > > > > *Moving idempotency to handler level*
>> > > > > > > >
>> > > > > > > > The idempotency check moves from the HTTP filter into the
>> > handler
>> > > > > > > methods,
>> > > > > > > > after authorization. The filter is removed. The caller's
>> > > principal
>> > > > > hash
>> > > > > > > is
>> > > > > > > > included in the idempotency binding and checked on replay.
>> > > > > > > >
>> > > > > > > > *Credential-bearing mutations (createTable)*
>> > > > > > > >
>> > > > > > > > On replay, instead of returning a stored response, the
>> handler
>> > > > calls
>> > > > > > > > buildLoadTableResponseWithDelegationCredentials with the
>> > existing
>> > > > > > table's
>> > > > > > > > metadata. This vends fresh credentials for the current
>> caller
>> > > > through
>> > > > > > the
>> > > > > > > > normal credential vending path. No credentials are stored in
>> > the
>> > > > > > > database —
>> > > > > > > > the idempotency record only stores the key, principal hash,
>> > table
>> > > > > > > > identifier, and status code.
>> > > > > > > >
>> > > > > > > > *Non-credential mutations (dropTable, renameTable,
>> > namespace/view
>> > > > > > > > operations)*
>> > > > > > > >
>> > > > > > > > These do not vend credentials, so there is no leakage
>> concern.
>> > On
>> > > > > > replay,
>> > > > > > > > the handler returns the stored status code and minimal
>> response
>> > > > body.
>> > > > > > > >
>> > > > > > > > *createTableStaged*
>> > > > > > > >
>> > > > > > > > Staged tables are not persisted to the catalog, so the
>> method
>> > can
>> > > > > > safely
>> > > > > > > > re-run on retry. Idempotency support for this endpoint can
>> be
>> > > > > deferred.
>> > > > > > > >
>> > > > > > > > *What is reused from the current implementation*
>> > > > > > > >
>> > > > > > > > The IdempotencyStore interface, database schema,
>> configuration,
>> > > and
>> > > > > > > > background purge logic are reused. The main change is moving
>> > the
>> > > > > > > > check/finalize logic from the filter into the handler.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > >
>> > > > > > > > Huaxin
>> > > > > > > >
>> > > > > > > > On Sun, Apr 5, 2026 at 11:53 AM huaxin gao <
>> > > [email protected]
>> > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Robert,
>> > > > > > > > >
>> > > > > > > > > Thanks for the follow-up.
>> > > > > > > > >
>> > > > > > > > > I appreciate the clarification. I read the listed items as
>> > fix
>> > > > > > options
>> > > > > > > > and
>> > > > > > > > > picked option 2, and the Phase 1/Phase 2 split was purely
>> to
>> > > keep
>> > > > > the
>> > > > > > > PR
>> > > > > > > > at
>> > > > > > > > > a reviewable size, not a design choice.
>> > > > > > > > >
>> > > > > > > > > I agree that the caller's identity should be part of the
>> > > binding.
>> > > > > > That
>> > > > > > > is
>> > > > > > > > > straightforward to add.
>> > > > > > > > >
>> > > > > > > > > On the architecture question, whether re-vending can work
>> > > within
>> > > > > the
>> > > > > > > > > filter or needs a handler-level approach, I want to give
>> this
>> > > > > proper
>> > > > > > > > > thought rather than rush an answer. I may not be able to
>> get
>> > > back
>> > > > > to
>> > > > > > > you
>> > > > > > > > on
>> > > > > > > > > this until next week (the week of 4/13).
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Huaxin
>> > > > > > > > >
>> > > > > > > > > On Sat, Apr 4, 2026 at 11:42 PM Robert Stupp <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> Hi Huaxin,
>> > > > > > > > >>
>> > > > > > > > >> Thanks for responding. A few corrections and
>> clarifications.
>> > > > > > > > >>
>> > > > > > > > >> My PR review described three consequences of the missing
>> > > caller
>> > > > > > > > >> identity binding and the credential leakage attack
>> vector.
>> > > Those
>> > > > > > > > >> were not suggestions for a Phase 1/Phase 2 split. The
>> > offline
>> > > > > > > > >> decision to split into phases, and what each phase
>> contains,
>> > > was
>> > > > > > > > >> yours [1].
>> > > > > > > > >>
>> > > > > > > > >> On the phasing: the concern is not PR size. The concern
>> is
>> > > that
>> > > > > > > > >> Phase 1 alone ships a full authorization bypass. The
>> filter
>> > > > > replays
>> > > > > > > > >> cached responses before Polaris's resource-level
>> > authorization
>> > > > > > > > >> runs. Stripping credentials does not fix that. Any
>> > > authenticated
>> > > > > > > > >> user with a valid idempotency key can receive a cached
>> > > response
>> > > > > > > > >> for a resource they are not authorized to access. That is
>> > the
>> > > > case
>> > > > > > > > >> with or without credentials in the response.
>> > > > > > > > >>
>> > > > > > > > >> "We will not release it until entire idempotency coding
>> is
>> > > > > > > > >> finished" is reassuring, but the architecture question
>> > > remains:
>> > > > > > > > >> can the generic filter approach support re-vending at
>> all?
>> > My
>> > > > > > > > >> email explained why I believe it cannot, since it would
>> > > require
>> > > > > > > > >> the filter to authorize the caller, resolve storage
>> > locations,
>> > > > > > > > >> call cloud IAM, and manage credential caching. That is
>> the
>> > > > catalog
>> > > > > > > > >> handler's job, not a filter's.
>> > > > > > > > >>
>> > > > > > > > >> The questions from my original email that still need
>> > answers:
>> > > > > > > > >>
>> > > > > > > > >> How does the binding prevent cross-principal cache hits
>> > > without
>> > > > > > > > >> including the caller's identity?
>> > > > > > > > >>
>> > > > > > > > >> How does Phase 2 re-vend credentials from a generic HTTP
>> > > filter
>> > > > > > > > >> without reimplementing the catalog handler's
>> authorization
>> > and
>> > > > > > > > >> credential vending logic?
>> > > > > > > > >>
>> > > > > > > > >> Has the operational overhead (3 DB round-trips per
>> mutation
>> > on
>> > > > > > > > >> the catalog's connection pool) been benchmarked on any
>> > > backend?
>> > > > > > > > >>
>> > > > > > > > >> I'd like to resolve these on this thread before more
>> code is
>> > > > > > > > >> written.
>> > > > > > > > >>
>> > > > > > > > >> Best,
>> > > > > > > > >> Robert
>> > > > > > > > >>
>> > > > > > > > >> [1]
>> > > > > > >
>> > > https://github.com/apache/polaris/pull/3803#issuecomment-4180969838
>> > > > > > > > >>
>> > > > > > > > >> On Sun, Apr 5, 2026 at 7:55 AM huaxin gao <
>> > > > [email protected]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >> > Hi Robert,
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks for raising this on the mailing list. I
>> acknowledge
>> > > the
>> > > > > > > design
>> > > > > > > > >> > concerns you listed.
>> > > > > > > > >> >
>> > > > > > > > >> > To clarify: I picked your option 2 (strip credentials
>> and
>> > > > > re-vend
>> > > > > > on
>> > > > > > > > >> > replay) you suggested in the PR review. The offline
>> > > discussion
>> > > > > > with
>> > > > > > > > >> Yufei
>> > > > > > > > >> > and Prashant was simply to verify that I picked the
>> best
>> > > > option
>> > > > > > from
>> > > > > > > > all
>> > > > > > > > >> > the three suggestions you listed in the review.
>> > > > > > > > >> >
>> > > > > > > > >> > The reason I split it into two phases is that the
>> current
>> > PR
>> > > > is
>> > > > > > > > already
>> > > > > > > > >> > around 3000 lines. Adding re-vending logic on top would
>> > make
>> > > > it
>> > > > > > very
>> > > > > > > > >> hard
>> > > > > > > > >> > to review. This is a big feature and we need to do it
>> > piece
>> > > by
>> > > > > > > piece.
>> > > > > > > > We
>> > > > > > > > >> > will not release it until entire idempotency coding is
>> > > > finished.
>> > > > > > > > >> >
>> > > > > > > > >> > I'm happy to discuss the architecture further on this
>> > > thread,
>> > > > > > > whether
>> > > > > > > > >> the
>> > > > > > > > >> > filter approach can support re-vending, or whether we
>> > need a
>> > > > > > > > >> handler-level
>> > > > > > > > >> > design. I may be slow to respond next week but I'll
>> follow
>> > > up
>> > > > > > after
>> > > > > > > > >> that.
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks,
>> > > > > > > > >> >
>> > > > > > > > >> > Huaxin
>> > > > > > > > >> >
>> > > > > > > > >> > On Sat, Apr 4, 2026 at 9:34 AM Robert Stupp <
>> > [email protected]
>> > > >
>> > > > > > wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > > Hi all,
>> > > > > > > > >> > >
>> > > > > > > > >> > > I need to raise some fundamental design concerns with
>> > the
>> > > > > > current
>> > > > > > > > >> > > Idempotency-Key implementation in PR #3803 [1]. Some
>> of
>> > > > these
>> > > > > > were
>> > > > > > > > >> > > raised in November 2025 [2] and were acknowledged [3]
>> > but
>> > > > > never
>> > > > > > > > >> > > addressed in the implementation.
>> > > > > > > > >> > >
>> > > > > > > > >> > > I raised the caller identity concern on the
>> > > > > > > > >> > > mailing list in November 2025 [2]. In December 2025
>> > [4], I
>> > > > > > > > >> > > reiterated that all existing implementations I could
>> > find
>> > > do
>> > > > > > > > >> > > request fingerprinting including the request body,
>> and
>> > > > warned
>> > > > > > that
>> > > > > > > > >> > > the idempotency subsystem itself must not become a
>> > source
>> > > of
>> > > > > > > > >> > > failures. The PR was opened in February 2026 without
>> > > > > addressing
>> > > > > > > > >> > > either point.
>> > > > > > > > >> > >
>> > > > > > > > >> > > The credential leakage consequences surfaced during
>> code
>> > > > > review
>> > > > > > > > >> > > [5]. The response was an offline decision between
>> three
>> > > > > > > > contributors,
>> > > > > > > > >> > > without mailing list discussion [6]. The decision
>> was to
>> > > > split
>> > > > > > > > >> > > into Phase 1 (strip credentials, which produces
>> broken
>> > > > > responses
>> > > > > > > > >> > > but does not fix the authorization bypass) and Phase
>> 2
>> > > > > (re-vend
>> > > > > > > > >> > > on replay, which would require a complete redesign
>> and
>> > > > > > > > >> > > reimplementation). Details below.
>> > > > > > > > >> > >
>> > > > > > > > >> > > I think we need to resolve these at the design level
>> > > before
>> > > > > more
>> > > > > > > > >> > > code is written.
>> > > > > > > > >> > >
>> > > > > > > > >> > > I will not do further code reviews on this PR until
>> > these
>> > > > > design
>> > > > > > > > >> > > concerns are resolved.
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > 1. Caller identity is not part of the idempotency
>> > binding
>> > > > > > > > >> > >
>> > > > > > > > >> > > The caller's identity is not part of the binding. The
>> > > > > > > > >> > > filter replays cached responses before Polaris's
>> > > > > resource-level
>> > > > > > > > >> > > authorization runs. Any authenticated user with a
>> valid
>> > > > > > > idempotency
>> > > > > > > > >> > > key can receive a cached response for a resource they
>> > are
>> > > > not
>> > > > > > > > >> > > authorized to access, including vended storage
>> > > credentials.
>> > > > > This
>> > > > > > > > >> > > is a full authorization bypass.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Idempotency keys are HTTP headers that end up in
>> access
>> > > > logs,
>> > > > > > > > >> > > observability tooling, debug output. They cannot be a
>> > > > security
>> > > > > > > > >> > > boundary. I described the attack vector in my PR
>> review
>> > > [5].
>> > > > > > > > >> > >
>> > > > > > > > >> > > The proposed Phase 2 [6] says it will "re-authorize
>> the
>> > > user
>> > > > > and
>> > > > > > > > >> > > re-vend fresh credentials on replay." That addresses
>> > > > > credentials
>> > > > > > > > >> > > but not the binding itself.
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > 2. Credential stripping breaks Polaris's deployment
>> > model
>> > > > > > > > >> > >
>> > > > > > > > >> > > Without mailing list discussion, the decision was
>> made
>> > > > offline
>> > > > > > > > >> > > to strip credentials before storing ("Phase 1") [6].
>> > > > > > > > >> > >
>> > > > > > > > >> > > Polaris's purpose is to vend down-scoped credentials.
>> > > > Clients
>> > > > > > have
>> > > > > > > > >> > > no ambient storage access.
>> > > > > > > > >> > >
>> > > > > > > > >> > > A stripped response returns a successful response
>> > ("table
>> > > > > > > created")
>> > > > > > > > >> > > to the client, which then fails hard because it
>> cannot
>> > > > access
>> > > > > > the
>> > > > > > > > >> > > object storage. Query engines (Spark, Trino, etc.)
>> will
>> > > fail
>> > > > > on
>> > > > > > > > >> > > the first storage operation against that table. Query
>> > > engine
>> > > > > > users
>> > > > > > > > >> > > have no way to resolve this, they don't control the
>> REST
>> > > > > catalog
>> > > > > > > > >> > > interaction. For stage-create, the client gets a
>> > > successful
>> > > > > > > > >> > > response telling it to write data but has no
>> credentials
>> > > to
>> > > > > > write
>> > > > > > > > >> > > with. No fallback exists.
>> > > > > > > > >> > >
>> > > > > > > > >> > > The stripped response is spec-compliant, but
>> > > spec-compliant
>> > > > > and
>> > > > > > > > >> > > operationally correct are not the same thing.
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > 3. Phase 2 is architecturally incompatible with
>> Phase 1
>> > > > > > > > >> > >
>> > > > > > > > >> > > The filter is a generic HTTP filter below the
>> > application
>> > > > > layer.
>> > > > > > > It
>> > > > > > > > >> > > treats responses as opaque JSON. Credential vending
>> > > happens
>> > > > > deep
>> > > > > > > > >> > > inside IcebergCatalogHandler, scoped to specific
>> table
>> > > > > locations
>> > > > > > > > >> > > and principals, through cloud IAM calls (STS
>> AssumeRole,
>> > > GCS
>> > > > > > token
>> > > > > > > > >> > > exchange).
>> > > > > > > > >> > >
>> > > > > > > > >> > > Re-vending on replay would require the filter to
>> resolve
>> > > the
>> > > > > > > > >> > > table's storage location, authorize the caller, call
>> > cloud
>> > > > IAM
>> > > > > > to
>> > > > > > > > >> > > mint scoped credentials, and manage credential
>> caching.
>> > > That
>> > > > > is
>> > > > > > > > >> > > not a filter, that is reimplementing the catalog
>> > handler's
>> > > > > > > > >> > > authorization and credential vending logic.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Phase 1 and Phase 2 need to be designed and shipped
>> > > > together.
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > 4. Operational overhead
>> > > > > > > > >> > >
>> > > > > > > > >> > > Every idempotent mutation adds 3 database round-trips
>> > > > (INSERT,
>> > > > > > > > UPDATE,
>> > > > > > > > >> > > heartbeat UPDATEs), each borrowing its own connection
>> > > > > > > > >> > > (autoCommit=true, no reuse), sharing the catalog's
>> > > > connection
>> > > > > > pool
>> > > > > > > > >> > > and RDBMS. This added latency increases the
>> probability
>> > of
>> > > > > > client
>> > > > > > > > >> > > timeouts, which are exactly the transient failures
>> that
>> > > > > > > idempotency
>> > > > > > > > >> > > is supposed to prevent.
>> > > > > > > > >> > >
>> > > > > > > > >> > > The idempotency table is high-churn and write-hot,
>> > > competing
>> > > > > > with
>> > > > > > > > >> > > catalog operations for connection pool capacity. The
>> > > > overhead
>> > > > > > gets
>> > > > > > > > >> > > worse on managed and distributed database backends.
>> > > > Duplicate
>> > > > > > > > >> > > requests trigger polling loops with repeated database
>> > > reads.
>> > > > > > > > >> > >
>> > > > > > > > >> > > No benchmark data has been presented for any backend.
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > What needs to change
>> > > > > > > > >> > >
>> > > > > > > > >> > > The current generic-filter approach cannot support
>> these
>> > > > > > > > >> > > requirements. This needs a different design.
>> > > > > > > > >> > >
>> > > > > > > > >> > > A new design must address the following:
>> > > > > > > > >> > >
>> > > > > > > > >> > > a) The idempotency binding MUST include the caller's
>> > > > principal
>> > > > > > > > >> > >    identity.
>> > > > > > > > >> > > b) Credential-bearing responses must not be stored
>> > as-is.
>> > > On
>> > > > > > > > >> > >    replay, the handler must build a fresh response
>> with
>> > > > fresh
>> > > > > > > > >> > >    credentials for the current caller.
>> > > > > > > > >> > > c) The operational overhead needs to be benchmarked.
>> > > > > > > > >> > >
>> > > > > > > > >> > > I am happy to review code once the design addresses
>> > these
>> > > > > > points.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Best,
>> > > > > > > > >> > > Robert
>> > > > > > > > >> > >
>> > > > > > > > >> > > [1] https://github.com/apache/polaris/pull/3803
>> > > > > > > > >> > > [2]
>> > > > > > > >
>> > https://lists.apache.org/thread/qqrgr2bzsp69629jdj8kf39m10pzwy6l
>> > > > > > > > >> > >     (Nov 24, 2025: "Building a service-internal key
>> from
>> > > the
>> > > > > > > client
>> > > > > > > > >> > >     provided idempotency key plus more data from the
>> > > request
>> > > > > > gives
>> > > > > > > > >> > >     better uniqueness. That more data can come from:
>> > > > operation
>> > > > > > id,
>> > > > > > > > >> > >     all operation parameters, including the whole
>> > payload,
>> > > > > > quite a
>> > > > > > > > >> > >     few HTTP request headers like user-agent,
>> > > authorization,
>> > > > > > host,
>> > > > > > > > >> > >     accept-* and also the client's address.")
>> > > > > > > > >> > > [3]
>> > > > > > > >
>> > https://lists.apache.org/thread/7vy44mxsq2cgtp8gsj0r8pk8blp37h5b
>> > > > > > > > >> > >     (Dec 8, 2025: "I agree that a bare
>> Idempotency-Key +
>> > > > > entity
>> > > > > > > > >> > >     identifier is not enough to protect against
>> buggy or
>> > > > > > malicious
>> > > > > > > > >> > >     clients; fingerprinting the full request would be
>> > > > > > stronger.")
>> > > > > > > > >> > > [4]
>> > > > > > > >
>> > https://lists.apache.org/thread/jt3gmpnjh6z490dvxorjs6ms00kvo264
>> > > > > > > > >> > >     (Dec 8, 2025: "We should avoid failing Polaris if
>> > this
>> > > > > > > subsystem
>> > > > > > > > >> > >     fails, or letting this subsystem be a reason for
>> its
>> > > > > > existence
>> > > > > > > > >> > >     (aka retry due to timeouts because the
>> > > > idempotent-request
>> > > > > > > > >> > >     subsystem hangs).")
>> > > > > > > > >> > > [5]
>> > > > > > > > >> > >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > >
>> > > >
>> >
>> https://github.com/apache/polaris/pull/3803#pullrequestreview-4051690055
>> > > > > > > > >> > > [6]
>> > > > > > > > >>
>> > > > >
>> https://github.com/apache/polaris/pull/3803#issuecomment-4180969838
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Design concerns with Idempotency-Key implementation (PR #3803)

Reply via email to