Hi Huaxin, Yufei, I left some comments on [4269] mostly about SPI design concerns.
Since we wanted to reopen a more general SPI discussion anyway, it might be worth having a dedicated sync about that... Alternatively, we could have that discussion by email now, if you prefer. I have some starting points to share for discussion. WDYT? [4269] https://github.com/apache/polaris/pull/4269 Thanks, Dmitri. On Fri, Apr 17, 2026 at 7:07 PM huaxin gao <[email protected]> wrote: > Hi Robert, > > Thanks, and I’m glad we’re aligned on the handler-level direction. I also > appreciate the detailed review and feedback throughout this discussion. > > Since the old implementation is going away and the handler-level > implementation will be in a fresh PR, would you be OK removing the -1 from > PR #3803 <https://github.com/apache/polaris/pull/3803>? If so, I’ll close > the old PR and open a new one for the handler-level approach. > > Best, > > Huaxin > > On Thu, Apr 16, 2026 at 12:57 AM Robert Stupp <[email protected]> wrote: > > > i Yufei, Huaxin, > > > > Thanks for the thoughtful replies, and for opening the issue for > > `createTableStaged` [1]. I’m glad we’re converging on the handler-level > > direction. > > > > I agree benchmarking can run in parallel with implementation. For my own > > review, I’d prefer to wait until reproducible benchmark results are > posted > > (using the scope from my April 14 note, or with any scope changes called > > out). > > > > Happy to review once that is available. > > > > Best, > > Robert > > > > [1] https://github.com/apache/polaris/issues/4200 > > > > > > On Tue, Apr 14, 2026 at 7:54 PM huaxin gao <[email protected]> > wrote: > > > > > Hi Robert, > > > > > > Thanks for the feedback. Glad the handler-level design addresses the > > > concerns. > > > > > > I also feel we should converge on the design and implementation first, > > and > > > run benchmarks in parallel. The access patterns will be different with > > the > > > handler-level approach, so benchmarking the current filter-based code > > > wouldn't be representative. > > > > > > I'll open a GitHub issue to track createTableStaged support, and I'll > > open > > > a fresh PR for the handler-level implementation. > > > > > > Best, > > > > > > Huaxin > > > > > > On Tue, Apr 14, 2026 at 10:30 AM Yufei Gu <[email protected]> > wrote: > > > > > > > Hi Robert, > > > > > > > > Thanks for the thorough review and for clearly laying out the > concerns, > > > > this is very helpful. > > > > > > > > For point 1(Q3), I do not think this needs to be a hard prerequisite > > > before > > > > looking at the implementation. > > > > > > > > The handler level design and correctness concerns are largely > > independent > > > > from the exact performance characteristics. It would be more > productive > > > to > > > > first converge on the right architecture, then validate and iterate > on > > > > performance with concrete measurements. Also, while the store is > > reused, > > > > the access pattern and critical sections change with the handler > level > > > > approach. Measuring the current implementation in isolation may not > > > reflect > > > > the actual behavior of the final design. > > > > > > > > We already have a similar situation with the NoSQL benchmark, where > the > > > > code has been merged for a while but the benchmarking work is still > in > > > > progress. That has not blocked design or code review, and we are > > > iterating > > > > on performance as we go. > > > > > > > > I agree that realistic benchmarking is important. But I would suggest > > we > > > > treat this as a parallel track, not a gate. Otherwise we risk > blocking > > > > progress on the design without knowing whether the measured overhead > is > > > > even representative of the final system. > > > > > > > > Happy to help define a benchmark plan once the design is settled. > > > > > > > > Yufei > > > > > > > > > > > > On Tue, Apr 14, 2026 at 6:28 AM Robert Stupp <[email protected]> wrote: > > > > > > > > > Hi Huaxin, > > > > > > > > > > yeah, the handler-level approach is the right call, I'm glad you > went > > > > with > > > > > it. > > > > > > > > > > To be explicit about what your proposal aims to resolve: > > > > > > > > > > - Authorization bypass is fixed. The check now runs after > > > resource-level > > > > > authorization. > > > > > - No credentials in the database. Fresh credentials re-vended on > > replay > > > > via > > > > > the normal vending path. > > > > > - Phase 2 architecture problem gone. The handler has the principal, > > > > storage > > > > > config, and IAM access that the filter never could have. > > > > > - Caller identity is in the binding. > > > > > > > > > > Three things still need adressing: > > > > > > > > > > First, and this is a hard prerequisite before I look at any > > > > implementation > > > > > code: Q3 from April 5 is still open. The store is being reused > as-is, > > > so > > > > > the overhead can be measured right now without new handler code. 3+ > > DB > > > > > round-trips per mutation on the catalog connection pool, write-hot > > > table, > > > > > polling loops under duplicate traffic - those numbers need to come > > > from a > > > > > realistic setup: networked database under concurrent load, not a > > local > > > > > Postgres. The latency concerns are specifically about Aurora and > > > > > CockroachDB, and the polling storm only shows up under concurrent > > > > duplicate > > > > > traffic. Please share the methodology so the results are > > reproducible. > > > > > > > > > > Second: please open a GitHub issue for the createTableStaged > > deferral. > > > > > > > > > > Third: since the filter is going away entirely and the critical > > > sections > > > > > move to the handlers, does it make sense to close #3803 and open a > > > fresh > > > > > PR? The old diff is mostly code that's being deleted, and it > carries > > a > > > > lot > > > > > of review history. A clean PR would be easier for everyone to > review > > - > > > > just > > > > > reference the old one for context. > > > > > > > > > > Happy to do a fresh review once the new implementation is up. > > > > > > > > > > Best, > > > > > Robert > > > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 1:44 AM huaxin gao <[email protected] > > > > > > wrote: > > > > > > > > > > > Hi Robert, > > > > > > > > > > > > Here is a revised design proposal for the idempotency replay > > > mechanism. > > > > > I'd > > > > > > appreciate your feedback. Thanks! > > > > > > > > > > > > *Moving idempotency to handler level* > > > > > > > > > > > > The idempotency check moves from the HTTP filter into the handler > > > > > methods, > > > > > > after authorization. The filter is removed. The caller's > principal > > > hash > > > > > is > > > > > > included in the idempotency binding and checked on replay. > > > > > > > > > > > > *Credential-bearing mutations (createTable)* > > > > > > > > > > > > On replay, instead of returning a stored response, the handler > > calls > > > > > > buildLoadTableResponseWithDelegationCredentials with the existing > > > > table's > > > > > > metadata. This vends fresh credentials for the current caller > > through > > > > the > > > > > > normal credential vending path. No credentials are stored in the > > > > > database — > > > > > > the idempotency record only stores the key, principal hash, table > > > > > > identifier, and status code. > > > > > > > > > > > > *Non-credential mutations (dropTable, renameTable, namespace/view > > > > > > operations)* > > > > > > > > > > > > These do not vend credentials, so there is no leakage concern. On > > > > replay, > > > > > > the handler returns the stored status code and minimal response > > body. > > > > > > > > > > > > *createTableStaged* > > > > > > > > > > > > Staged tables are not persisted to the catalog, so the method can > > > > safely > > > > > > re-run on retry. Idempotency support for this endpoint can be > > > deferred. > > > > > > > > > > > > *What is reused from the current implementation* > > > > > > > > > > > > The IdempotencyStore interface, database schema, configuration, > and > > > > > > background purge logic are reused. The main change is moving the > > > > > > check/finalize logic from the filter into the handler. > > > > > > > > > > > > Best, > > > > > > > > > > > > Huaxin > > > > > > > > > > > > On Sun, Apr 5, 2026 at 11:53 AM huaxin gao < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Hi Robert, > > > > > > > > > > > > > > Thanks for the follow-up. > > > > > > > > > > > > > > I appreciate the clarification. I read the listed items as fix > > > > options > > > > > > and > > > > > > > picked option 2, and the Phase 1/Phase 2 split was purely to > keep > > > the > > > > > PR > > > > > > at > > > > > > > a reviewable size, not a design choice. > > > > > > > > > > > > > > I agree that the caller's identity should be part of the > binding. > > > > That > > > > > is > > > > > > > straightforward to add. > > > > > > > > > > > > > > On the architecture question, whether re-vending can work > within > > > the > > > > > > > filter or needs a handler-level approach, I want to give this > > > proper > > > > > > > thought rather than rush an answer. I may not be able to get > back > > > to > > > > > you > > > > > > on > > > > > > > this until next week (the week of 4/13). > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Huaxin > > > > > > > > > > > > > > On Sat, Apr 4, 2026 at 11:42 PM Robert Stupp <[email protected]> > > > wrote: > > > > > > > > > > > > > >> Hi Huaxin, > > > > > > >> > > > > > > >> Thanks for responding. A few corrections and clarifications. > > > > > > >> > > > > > > >> My PR review described three consequences of the missing > caller > > > > > > >> identity binding and the credential leakage attack vector. > Those > > > > > > >> were not suggestions for a Phase 1/Phase 2 split. The offline > > > > > > >> decision to split into phases, and what each phase contains, > was > > > > > > >> yours [1]. > > > > > > >> > > > > > > >> On the phasing: the concern is not PR size. The concern is > that > > > > > > >> Phase 1 alone ships a full authorization bypass. The filter > > > replays > > > > > > >> cached responses before Polaris's resource-level authorization > > > > > > >> runs. Stripping credentials does not fix that. Any > authenticated > > > > > > >> user with a valid idempotency key can receive a cached > response > > > > > > >> for a resource they are not authorized to access. That is the > > case > > > > > > >> with or without credentials in the response. > > > > > > >> > > > > > > >> "We will not release it until entire idempotency coding is > > > > > > >> finished" is reassuring, but the architecture question > remains: > > > > > > >> can the generic filter approach support re-vending at all? My > > > > > > >> email explained why I believe it cannot, since it would > require > > > > > > >> the filter to authorize the caller, resolve storage locations, > > > > > > >> call cloud IAM, and manage credential caching. That is the > > catalog > > > > > > >> handler's job, not a filter's. > > > > > > >> > > > > > > >> The questions from my original email that still need answers: > > > > > > >> > > > > > > >> How does the binding prevent cross-principal cache hits > without > > > > > > >> including the caller's identity? > > > > > > >> > > > > > > >> How does Phase 2 re-vend credentials from a generic HTTP > filter > > > > > > >> without reimplementing the catalog handler's authorization and > > > > > > >> credential vending logic? > > > > > > >> > > > > > > >> Has the operational overhead (3 DB round-trips per mutation on > > > > > > >> the catalog's connection pool) been benchmarked on any > backend? > > > > > > >> > > > > > > >> I'd like to resolve these on this thread before more code is > > > > > > >> written. > > > > > > >> > > > > > > >> Best, > > > > > > >> Robert > > > > > > >> > > > > > > >> [1] > > > > > > https://github.com/apache/polaris/pull/3803#issuecomment-4180969838 > > > > > > >> > > > > > > >> On Sun, Apr 5, 2026 at 7:55 AM huaxin gao < > > [email protected] > > > > > > > > > > wrote: > > > > > > >> > > > > > > >> > Hi Robert, > > > > > > >> > > > > > > > >> > Thanks for raising this on the mailing list. I acknowledge > the > > > > > design > > > > > > >> > concerns you listed. > > > > > > >> > > > > > > > >> > To clarify: I picked your option 2 (strip credentials and > > > re-vend > > > > on > > > > > > >> > replay) you suggested in the PR review. The offline > discussion > > > > with > > > > > > >> Yufei > > > > > > >> > and Prashant was simply to verify that I picked the best > > option > > > > from > > > > > > all > > > > > > >> > the three suggestions you listed in the review. > > > > > > >> > > > > > > > >> > The reason I split it into two phases is that the current PR > > is > > > > > > already > > > > > > >> > around 3000 lines. Adding re-vending logic on top would make > > it > > > > very > > > > > > >> hard > > > > > > >> > to review. This is a big feature and we need to do it piece > by > > > > > piece. > > > > > > We > > > > > > >> > will not release it until entire idempotency coding is > > finished. > > > > > > >> > > > > > > > >> > I'm happy to discuss the architecture further on this > thread, > > > > > whether > > > > > > >> the > > > > > > >> > filter approach can support re-vending, or whether we need a > > > > > > >> handler-level > > > > > > >> > design. I may be slow to respond next week but I'll follow > up > > > > after > > > > > > >> that. > > > > > > >> > > > > > > > >> > Thanks, > > > > > > >> > > > > > > > >> > Huaxin > > > > > > >> > > > > > > > >> > On Sat, Apr 4, 2026 at 9:34 AM Robert Stupp <[email protected] > > > > > > wrote: > > > > > > >> > > > > > > > >> > > Hi all, > > > > > > >> > > > > > > > > >> > > I need to raise some fundamental design concerns with the > > > > current > > > > > > >> > > Idempotency-Key implementation in PR #3803 [1]. Some of > > these > > > > were > > > > > > >> > > raised in November 2025 [2] and were acknowledged [3] but > > > never > > > > > > >> > > addressed in the implementation. > > > > > > >> > > > > > > > > >> > > I raised the caller identity concern on the > > > > > > >> > > mailing list in November 2025 [2]. In December 2025 [4], I > > > > > > >> > > reiterated that all existing implementations I could find > do > > > > > > >> > > request fingerprinting including the request body, and > > warned > > > > that > > > > > > >> > > the idempotency subsystem itself must not become a source > of > > > > > > >> > > failures. The PR was opened in February 2026 without > > > addressing > > > > > > >> > > either point. > > > > > > >> > > > > > > > > >> > > The credential leakage consequences surfaced during code > > > review > > > > > > >> > > [5]. The response was an offline decision between three > > > > > > contributors, > > > > > > >> > > without mailing list discussion [6]. The decision was to > > split > > > > > > >> > > into Phase 1 (strip credentials, which produces broken > > > responses > > > > > > >> > > but does not fix the authorization bypass) and Phase 2 > > > (re-vend > > > > > > >> > > on replay, which would require a complete redesign and > > > > > > >> > > reimplementation). Details below. > > > > > > >> > > > > > > > > >> > > I think we need to resolve these at the design level > before > > > more > > > > > > >> > > code is written. > > > > > > >> > > > > > > > > >> > > I will not do further code reviews on this PR until these > > > design > > > > > > >> > > concerns are resolved. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > 1. Caller identity is not part of the idempotency binding > > > > > > >> > > > > > > > > >> > > The caller's identity is not part of the binding. The > > > > > > >> > > filter replays cached responses before Polaris's > > > resource-level > > > > > > >> > > authorization runs. Any authenticated user with a valid > > > > > idempotency > > > > > > >> > > key can receive a cached response for a resource they are > > not > > > > > > >> > > authorized to access, including vended storage > credentials. > > > This > > > > > > >> > > is a full authorization bypass. > > > > > > >> > > > > > > > > >> > > Idempotency keys are HTTP headers that end up in access > > logs, > > > > > > >> > > observability tooling, debug output. They cannot be a > > security > > > > > > >> > > boundary. I described the attack vector in my PR review > [5]. > > > > > > >> > > > > > > > > >> > > The proposed Phase 2 [6] says it will "re-authorize the > user > > > and > > > > > > >> > > re-vend fresh credentials on replay." That addresses > > > credentials > > > > > > >> > > but not the binding itself. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > 2. Credential stripping breaks Polaris's deployment model > > > > > > >> > > > > > > > > >> > > Without mailing list discussion, the decision was made > > offline > > > > > > >> > > to strip credentials before storing ("Phase 1") [6]. > > > > > > >> > > > > > > > > >> > > Polaris's purpose is to vend down-scoped credentials. > > Clients > > > > have > > > > > > >> > > no ambient storage access. > > > > > > >> > > > > > > > > >> > > A stripped response returns a successful response ("table > > > > > created") > > > > > > >> > > to the client, which then fails hard because it cannot > > access > > > > the > > > > > > >> > > object storage. Query engines (Spark, Trino, etc.) will > fail > > > on > > > > > > >> > > the first storage operation against that table. Query > engine > > > > users > > > > > > >> > > have no way to resolve this, they don't control the REST > > > catalog > > > > > > >> > > interaction. For stage-create, the client gets a > successful > > > > > > >> > > response telling it to write data but has no credentials > to > > > > write > > > > > > >> > > with. No fallback exists. > > > > > > >> > > > > > > > > >> > > The stripped response is spec-compliant, but > spec-compliant > > > and > > > > > > >> > > operationally correct are not the same thing. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > 3. Phase 2 is architecturally incompatible with Phase 1 > > > > > > >> > > > > > > > > >> > > The filter is a generic HTTP filter below the application > > > layer. > > > > > It > > > > > > >> > > treats responses as opaque JSON. Credential vending > happens > > > deep > > > > > > >> > > inside IcebergCatalogHandler, scoped to specific table > > > locations > > > > > > >> > > and principals, through cloud IAM calls (STS AssumeRole, > GCS > > > > token > > > > > > >> > > exchange). > > > > > > >> > > > > > > > > >> > > Re-vending on replay would require the filter to resolve > the > > > > > > >> > > table's storage location, authorize the caller, call cloud > > IAM > > > > to > > > > > > >> > > mint scoped credentials, and manage credential caching. > That > > > is > > > > > > >> > > not a filter, that is reimplementing the catalog handler's > > > > > > >> > > authorization and credential vending logic. > > > > > > >> > > > > > > > > >> > > Phase 1 and Phase 2 need to be designed and shipped > > together. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > 4. Operational overhead > > > > > > >> > > > > > > > > >> > > Every idempotent mutation adds 3 database round-trips > > (INSERT, > > > > > > UPDATE, > > > > > > >> > > heartbeat UPDATEs), each borrowing its own connection > > > > > > >> > > (autoCommit=true, no reuse), sharing the catalog's > > connection > > > > pool > > > > > > >> > > and RDBMS. This added latency increases the probability of > > > > client > > > > > > >> > > timeouts, which are exactly the transient failures that > > > > > idempotency > > > > > > >> > > is supposed to prevent. > > > > > > >> > > > > > > > > >> > > The idempotency table is high-churn and write-hot, > competing > > > > with > > > > > > >> > > catalog operations for connection pool capacity. The > > overhead > > > > gets > > > > > > >> > > worse on managed and distributed database backends. > > Duplicate > > > > > > >> > > requests trigger polling loops with repeated database > reads. > > > > > > >> > > > > > > > > >> > > No benchmark data has been presented for any backend. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > What needs to change > > > > > > >> > > > > > > > > >> > > The current generic-filter approach cannot support these > > > > > > >> > > requirements. This needs a different design. > > > > > > >> > > > > > > > > >> > > A new design must address the following: > > > > > > >> > > > > > > > > >> > > a) The idempotency binding MUST include the caller's > > principal > > > > > > >> > > identity. > > > > > > >> > > b) Credential-bearing responses must not be stored as-is. > On > > > > > > >> > > replay, the handler must build a fresh response with > > fresh > > > > > > >> > > credentials for the current caller. > > > > > > >> > > c) The operational overhead needs to be benchmarked. > > > > > > >> > > > > > > > > >> > > I am happy to review code once the design addresses these > > > > points. > > > > > > >> > > > > > > > > >> > > Best, > > > > > > >> > > Robert > > > > > > >> > > > > > > > > >> > > [1] https://github.com/apache/polaris/pull/3803 > > > > > > >> > > [2] > > > > > > https://lists.apache.org/thread/qqrgr2bzsp69629jdj8kf39m10pzwy6l > > > > > > >> > > (Nov 24, 2025: "Building a service-internal key from > the > > > > > client > > > > > > >> > > provided idempotency key plus more data from the > request > > > > gives > > > > > > >> > > better uniqueness. That more data can come from: > > operation > > > > id, > > > > > > >> > > all operation parameters, including the whole payload, > > > > quite a > > > > > > >> > > few HTTP request headers like user-agent, > authorization, > > > > host, > > > > > > >> > > accept-* and also the client's address.") > > > > > > >> > > [3] > > > > > > https://lists.apache.org/thread/7vy44mxsq2cgtp8gsj0r8pk8blp37h5b > > > > > > >> > > (Dec 8, 2025: "I agree that a bare Idempotency-Key + > > > entity > > > > > > >> > > identifier is not enough to protect against buggy or > > > > malicious > > > > > > >> > > clients; fingerprinting the full request would be > > > > stronger.") > > > > > > >> > > [4] > > > > > > https://lists.apache.org/thread/jt3gmpnjh6z490dvxorjs6ms00kvo264 > > > > > > >> > > (Dec 8, 2025: "We should avoid failing Polaris if this > > > > > subsystem > > > > > > >> > > fails, or letting this subsystem be a reason for its > > > > existence > > > > > > >> > > (aka retry due to timeouts because the > > idempotent-request > > > > > > >> > > subsystem hangs).") > > > > > > >> > > [5] > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > https://github.com/apache/polaris/pull/3803#pullrequestreview-4051690055 > > > > > > >> > > [6] > > > > > > >> > > > https://github.com/apache/polaris/pull/3803#issuecomment-4180969838 > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
