Re: [DISCUSS] Credential Vending Refactor (PR #3699)

Tornike Gurgenidze Sat, 04 Apr 2026 02:23:49 -0700

Hi Dennis,

Thanks a lot for such a detailed review. I understand the use case and the
reasons for the current design much better now.


1. Solidify some variation of PolarisCredentialVendor as the interface

definition for *using* a PolarisStorageIntegration to get a credential.
> This doesn't *need* to be implemented by monolithic MetaStoreManager impls,
> but it still needs to exist as an optionally persistence-aware interface


That sounds good to me, we can bring the interface back. iiuc, oss impl
will no longer have anything to do with MetastoreManager, while custom impl
will be free to integrate with IntegrationPersistence. We will simply have
to keep PolarisCredentialVendor as an SPI + relevant methods in
IntegrationPersistence around even if unused in oss. Let's iterate on the
actual interface signature in the PR.

2. Solidify some variation of PolarisStorageIntegrationProvider as the

interface definition for *constructing/leasing* a new
> PolarisStorageIntegration strictly for CreateCatalog (or Create for any

entity that might hold a StorageConfig)


Makes sense. PR changes PolarisStorageIntegrations to be singletons (one
per integration type) and PolarisStorageIntegration objects accept
StorageConfig directly. I'm assuming that will probably be too awkward with
config/integration separation you described in mind. I can change oss
PolarisStorageIntegrationProvider impl to serve per-storageConfig
singletons instead. That way we can avoid including StorageConfig into the
StorageIntegration method signature.

3. Maybe reconcile those two "using" and "constructing" methods under a
> single unified interface that is still injectable

I'm sure I won't be able to come up with a good enough name for the
combined SPI :)

6. We can evolve the PolarisCredentialVendor method interface to take a
> whole PolarisEntity instead of entityId, type, etc., to potentially avoid a
> re-lookup. Such a syntactic change is still *semantically* compatible with
> the SPI because any impls that used to need enttyId can still call
> entity.getId() if we pass in the whole entity


Let's iterate on the PR. I think a single PolarisEntity might not be
sufficient unfortunately because of the storage name override PR (
https://github.com/apache/polaris/pull/4023). The resolution there
effectively depends on constructing StorageConfig from several entities
(plus application config). Current implementation in this PR hoped to make
that easy by bypassing PolarisEntity altogether and passing constructed
StorageConfig directly, but now I understand that will complicate
persistence for your custom CredentialVendor implementation. Maybe passing
List<PolarisEntity> for the whole resolved path instead of a single
"resolved" PolarisEntity is the way to go?

7. Alternatively, if we really want to get callsites to call

getSubscopedCreds directly on a PolarisStorageIntegration object, maybe
> it's possible to model the persistence cooperation in subclasses of
> PolarisStorageIntegration itself, but we still need a way to distinguish
> between whether a caller is *constructing* a new StorageIntegration or

*using* an entity-linked StorageIntegration.


I think that would make the design more coupled and complicated, probably
not worth it.

Once again, thanks for a thorough review. I'll start adapting the PR with
these in mind.

On Fri, Apr 3, 2026 at 4:15 AM Dennis Huo <[email protected]> wrote:

> Thanks for starting this discussion! Some refactoring does seem helpful in
> this area. However, I think there are some elements of the currently
> entrenched SPIs to discuss. Specifically, the coordination with the
> persistence layer is an intentional feature, even though we should probably
> encapsulate it to simplify the cases where it's not needed by the
> persistence layer. Reposting some of my comments from the PR:
>
> Though we should indeed be able to *decouple* the various interfaces pulled
> in/aggregated by `PolarisMetaStoreManager`:
>
>         PolarisSecretsManager,
>         PolarisGrantManager,
>         PolarisCredentialVendor,
>         PolarisPolicyMappingManager,
>         PolarisEventManager,
>         PolarisMetricsManager
>
> those interfaces were specifically written as part of SPIs, and at least
> the same *functionality* provided by those extensibility points needs to be
> preserved more strictly than syntax evolution of those interfaces, unless a
> much deeper attempt is made amongst Polaris users to agree on a migration
> path off of such functionality.
>
> There's some discussion about the list of core SPIs in here:
> https://lists.apache.org/thread/0nj24zro7kyctqfnlml08ppo7zs9xcqs
>
> The way that applies to this PR is that the interaction with the
> persistence layer is intentional, especially for the
> TransactionalMetaStoreManager path.
>
> The basic use case if if a custom Polaris deployment uses subclasses that
> leverage those SPIs in order to perform additional book-keeping around
> storage-integration functionality and needs that to be transactional with
> entity creation/updates. This is best highlighted by thinking about the
> interaction between the subset of `PolarisCredentialVendor` method
> implementations with `IntegrationPersistence` which has these methods:
>
>     createStorageIntegration
>     persistStorageIntegrationIfNeeded
>     loadPolarisStorageIntegration
>
> These are in the interface specifically to define the SPI by which users
> customize a "lease, commit, use" model for secondary metadata necessary for
> StorageIntegrations. In this current PR, it's not clear what happens to
> those methods now -- will
> `IntegrationPersistence::loadPolarisStorageIntegration` just be ignored?
>
> Overall, I think the refactoring on the credential caching layer should
> still be possible, while preserving the `PolarisCredentialVendor` interface
> so that implementations of `PolarisCredentialVendor` still work correctly.
>
> The use case to preserve is as follows:
>
> 1. Suppose the running Polaris service doesn't just use a static/fixed
> source IAM User from environment variables, but instead maintains a *pool*
> of different source-IAM-user credentials in an offloaded sidecar
> 2. Each new Catalog/StorageConfig created is intended to lease a new
> identity from the *pool* and keep track of that lease atomically with the
> CatalogEntity
> 3. The CreateCatalog flow as traced through
> `TransactionalMetaStoreManagerImpl.createcatalog` is as follows:
>     a. Prepare catalog entity info in-memory
>     b. Call IntegrationPersistence.createStorageIntegrationInCurrentTxn --
> this will call the sidecar credential-pool layer to *lease* a new IAM user
> by reference
>     c. Start *transaction* for Polaris persistence layer
>     d. Lookups to validate current transaction state
>     e. In the same transaction that persists the new CatalogEntity,
> calls persistStorageIntegrationIfNeededInCurrentTxn - this may be custom to
> different Polaris service owners, where a book-keeping table is updated
> cooperatively with the sidecar credential-pool layer to consume the leased
> IAM user and store the fact that the new CatalogEntity's entityId is
> tightly coupled to that leased IAM user
>     f. Commit transaction all in one piece
> 4. The getSubscopedCreds flow would then be as follows:
>     a. Load entity again based on id (this is indeed probably redundant and
> we could probably refactor this)
>     b. Call
> IntegrationPersistence.loadPolarisStorageIntegrationInCurrentTxn with the
> retrieved linked PolarisEntity
>     c. The custom service instance can have an impl of
> IntegrationPersistence here that uses the *stateful* book-keeping of the
> leased credential-pool object to return a handle into that sidecar
> credential-pool
>     d. The StorageIntegration returned may not be the same thing as the
> original object from "createStorageIntegrationInCurrentTxn" - it might be a
> handle to a temporary authenticated connection into the sidecar
> credential-vendor, for example
>     e. This new stateful and entity-coupled PolarisStorageIntegration is
> ultimately used to getSubscopedCreds
>
> Today, I think the main cause of confusion is that we only have
> "placeholder" implementations in the main repo of
> loadPolarisStorageIntegrationInCurrentTxn that conflate "lease/construct
> StorageIntegration" with "fetch a vending-ready StorageIntegration". This
> was because credential-pooling semantics tend to be specific to each
> individual Polaris service-runner, and it'd be a lot of complexity to have
> a "toy" implementation of credential-pooling for single-tenant use cases
> that don't need it.
>
> I think the overall flow could still be preserved while performing most of
> the proposed refactors if we:
>
> 1. Solidify some variation of PolarisCredentialVendor as the interface
> definition for *using* a PolarisStorageIntegration to get a credential.
> This doesn't *need* to be implemented by monolithic MetaStoreManager impls,
> but it still needs to exist as an optionally persistence-aware interface
> 2. Solidify some variation of PolarisStorageIntegrationProvider as the
> interface definition for *constructing/leasing* a new
> PolarisStorageIntegration strictly for CreateCatalog (or Create for any
> entity that might hold a StorageConfig)
> 3. Maybe reconcile those two "using" and "constructing" methods under a
> single unified interface that is still injectable
> 4. If we want some persistence types like the NoSql stack to always stay
> away from being in the business of facilitating credential vending, the
> callsite should obtain a PolarisCredentialVendor from some injection or
> factory method -- the default PolarisCredentialVendor can indeed be the
> current dumb in-memory "Construct from application-level configs in-memory
> and vend immediately" implementation.
> 5. Persistence impls that need stateful cooperation with a credential pool
> can still configure to provide their existing "CustomMetaStoreManagerImpl
> implements PolarisCredentialVendor" as the thing that's returned to
> callsites wanting to get credentials
> 6. We can evolve the PolarisCredentialVendor method interface to take a
> whole PolarisEntity instead of entityId, type, etc., to potentially avoid a
> re-lookup. Such a syntactic change is still *semantically* compatible with
> the SPI because any impls that used to need enttyId can still call
> entity.getId() if we pass in the whole entity
> 7. Alternatively, if we really want to get callsites to call
> getSubscopedCreds directly on a PolarisStorageIntegration object, maybe
> it's possible to model the persistence cooperation in subclasses of
> PolarisStorageIntegration itself, but we still need a way to distinguish
> between whether a caller is *constructing* a new StorageIntegration or
> *using* an entity-linked StorageIntegration
>
> On Wed, Apr 1, 2026 at 1:34 PM Tornike Gurgenidze <[email protected]>
> wrote:
>
> > Hi all,
> >
> > I'd like to bring up a refactoring effort around credential vending that
> > I've been working on in PR #3699
> > <https://github.com/apache/polaris/pull/3699>. Dmitri has been providing
> > feedback and helping a lot along the way, but I wanted to open this up
> for
> > broader discussion before iterating further.
> >
> > Motivation
> >
> > The current credential vending flow is deeply entangled with the
> > persistence layer. When a client requests scoped credentials (e.g. for
> S3,
> > GCS, or Azure), the request goes through: StorageCredentialsVendor ->
> > PolarisCredentialVendor -> MetaStoreManager -> persistence layer -> back
> > out through PolarisStorageIntegrationProvider. This means credential
> > vending re-loads entities from persistence even though the caller already
> > has them, and MetaStoreManager implementations are burdened with
> credential
> > vending logic that doesn't belong in persistence.
> >
> > Overall, the sheer amount of complexity and the amount of layers that
> > credential vending flow goes through makes further changes particularly
> > challenging as evidenced by some recent efforts around cache key
> > generation, storage info resolution, additional storage backends and so
> on.
> >
> > What the PR does
> >
> > 1. Removes credential vending from MetaStoreManager. The
> > PolarisCredentialVendor interface, StorageCredentialsVendor, and
> > getSubscopedCredsForEntity() implementations are removed from
> > MetaStoreManager. This cleans up both the transactional and NoSQL
> backends.
> >
> > 2. Moves orchestration into StorageAccessConfigProvider. This
> > application-scoped bean now directly resolves the storage integration and
> > delegates to it, cutting out the persistence round-trip.
> >
> > 3. Moves caching into storage integrations. Each
> PolarisStorageIntegration
> > subclass (AWS, GCP, Azure) now owns its StorageCredentialCache
> interaction
> > and builds cloud-specific cache keys, rather than using a
> one-size-fits-all
> > key.
> >
> > I'd appreciate any feedback on the overall direction, concerns about API
> > compatibility in polaris-core, or suggestions for how to best land these
> > changes.
> >
> > Thanks,
> > Tornike
> >
>

Re: [DISCUSS] Credential Vending Refactor (PR #3699)

Reply via email to