Hi Everyone,

We had an opportunity to discuss this feature and my recent proposal at
the last community sync meeting. I would like to summarize our  discussion
and enumerate the various options we considered to help us reach a
consensus.

To recap, storage configuration is currently restricted at the catalog
level. This limits flexibility for users who need to organize tables across
different storage configurations or cloud providers within a single
catalog. There appears to be general agreement on the utility of this
feature; however, we still need to align on the specific implementation
approach.

Here are the various options that were considered.
*Option 0: Make Credentials available as part of table properties. *(This
was my original proposal, but abandoned after becoming aware of the
security implications.)

*Option 1: First-Class Storage Configuration Entity *

This approach proposes elevating StorageConfiguration to a standalone,
top-level resource in the Polaris backend (similar to a Principal,
Namespace or Table), independent of the Catalog or Table. This is the
approach in my most recent proposal doc.
-

Data Model: A new StorageConfiguration entity is created with its own
unique identifier and lifecycle. Tables and Namespaces would store a
reference ID pointing to this entity rather than embedding the credentials
directly.
-

Security: This model offers the cleanest security boundary. We can
introduce a specific USAGE privilege on the configuration entity. A user
would need both CREATE_TABLE on the Namespace *and* USAGE on the specific
StorageConfiguration to link them.
-

Credential Rotation: Highly efficient. Because the configuration is
referenced by ID, rotating a cloud IAM role or secret requires updating
only the single StorageConfiguration entity. All thousands of downstream
tables referencing it would immediately use the new credentials without
metadata updates.
-

Inheritance: The reference could be set at the Catalog, Namespace, or Table
level. If a Table does not specify a reference, it would inherit the
reference from its parent Namespace (and so on), preserving the current
hierarchical behavior while adding granularity.

• Pros: Maximum flexibility and reusability (Many-to-Many). Updating one
config object propagates to all associated tables.
-

• Cons: Highest engineering cost. Requires new CRUD APIs, DB schema changes
(mapping tables), and complex authorization logic (two-stage auth checks).
Risk of accumulating "orphaned" configs

Option 2: The "Embedded Field" Model
-

This approach extends the existing Table and Namespace entities to include
a storageConfig field. The parameter can be defaulted to 'null' and use
parent's storageConfig at runtime.

*Data Model:* No new top-level entity is created. The storage details
(e.g., roleArn) are stored directly into a new, dedicated column or
structure within the existing Table/Namespace entity.

Complexity: This could reduce the engineering overhead significantly. There
are no new CRUD endpoints for configuration objects, no referential
integrity checks (e.g., preventing the deletion of a config used by active
tables).

Credential Rotation: Credential rotation is difficult. If an IAM role
changes, an administrator must identify and issue UPDATE operations for
every individual table or namespace that uses that specific configuration,
potentially affecting thousands of objects.

• Pros: Lowest engineering cost. No new entities or complex mappings are
required. Easy to reason about authorization (auth is tied strictly to the
entity).

• Cons: No reusability. Configs must be duplicated across tables; rotating
credentials for 1,000 tables could require 1,000 update calls.

Option 3: Named Catalog-Level Configurations (Hybrid)

This can be a combination of Option1 and Option 2
Admin can define a registry of "Named Storage Configurations" stored within
the Catalog. Sub-entities (Namespaces/Tables) reference these configs by
name (e.g., storage-config: "finance-secure-role").

*Data Model:* No separate top level entity is created. The Catalog Entity
potentially needs to be modified to accommodate named storage
configurations.

Credential Rotation: Credential Rotation can be done at the catalog level
for each named Storage Configuration.

Inheritance: Works pretty much similar as proposed in option 1 & option2.

Security: Not as secure as option1 but still useful. A principal with
proper access can attach any named storage configuration defined at the
catalog level to any arbitrary entity within the catalog.

• Pros: Good balance of reusability and simplicity. Allows updating a
config in one place (the Catalog definition) without needing a full-blown
global entity system.

• Cons: Scope is limited to the Catalog (cannot share configs across
catalogs)
Option 4: Leverage Existing Policy Framework

This approach leverages the existing Apache Polaris Policy Framework
(currently used for features like snapshot expiry) to manage storage
settings.

Data Model: Storage configurations are defined as "Policies" at the Catalog
level. These Policies contain the credential details and can be attached to
Namespaces or Tables using the existing policy attachment APIs.

Inheritance:  This aligns naturally with Polaris's existing architecture,
where policies cascade from Catalog → Namespace → Table. The vending logic
would simply resolve the "effective" storage policy for a table at query
time.

Security: This utilizes the existing Polaris Privileges and attachment
privileges. Administrators can define authorized storage policies
centrally, and users can only select from these pre-approved policies,
preventing them from inputting arbitrary or insecure role ARNs.

• Pros:
  . Zero New Infrastructure: Reuses the existing "Policy" entity,
persistence layer, and inheritance logic, significantly reducing
engineering effort
  . Proven Inheritance: The logic for resolving policies from child to
parent is already implemented and tested

• Cons:
  . Semantic Confusion: Policies are typically used for "governance rules"
(e.g., snapshot expiry, compaction) rather than "connectivity
configuration." Using them for credentials might be unintuitive
  . Authorization Complexity: The authorizer would need to load and
evaluate policies to determine how to access data, potentially coupling
governance logic with data access paths

We can potentially start with one of the options initially and as the
feature and user needs develop we can migrate to other options as well.
Please let me know your thoughts about the various options above or if on
anything that I might have missed so that we can work towards a consensus
on how to implement this feature.


On Thu, Feb 5, 2026 at 8:08 AM Tornike Gurgenidze <[email protected]>
wrote:

> Hi,
>
> To follow up on Dmitri's point about credentials, there's already a PR
> <https://github.com/apache/polaris/pull/3409> up that is going to allow
> predefining named storage credentials in polaris config like the following:
>
>    - polaris.storage.aws.<storage-name>.access-key
>    - polaris.storage.aws.<storage-name>.secret-key
>
> then storage configuration will simply refer to it by name and
> inherit credentials.
>
> I think that can go hand in hand with table-level overrides. Overriding
> each and every aws property for every table doesn't sound ideal. Defining a
> storage configuration upfront and referring to it by name should be a
> simpler solution. I can extend the scope of the PR above to allow
> predefining other aws properties as well like endpoint-url and region.
>
> Another point that came up in the discussion surrounding extra credentials
> is how to make sure anyone can't just hijack pre configured credentials.
> The simplest solution I see there is to ship off properties to OPA during
> catalog (and table) creation and allow users to write policies based on
> them. If we want to enable internal rbac to have a similar capability we
> can go further and move from config based storage definition to a separate
> `/storage-config` rest resource in management API that will come with
> necessary grants and permissions.
>
> On Thu, Feb 5, 2026 at 5:43 AM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi Srinivas,
> >
> > Thanks for the proposal. It looks good to me overall, a very timely
> feature
> > to add to Polaris.
> >
> > I added some comments in the doc and I see this topic on the Community
> Sync
> > agenda for Feb 5. Looking forward to discussing it online.
> >
> > I have three points to highlight:
> >
> > * Dealing with passwords probably connects to the Secrets Manager
> > discussion [1]
> >
> > * Persistence needs to consider non-RDBMS backends. OSS code has both
> > PostgreSQL and MongoDB, but private Persistence implementations are
> > possible too. I believe we need a proper SPI for this, not just a
> > relational schema example.
> >
> > * Associating entities (tables, namespaces) to Storage Configuration is
> > likely a plugin point that downstream projects may want to customize. I'd
> > propose making another SPI for this. This SPI is probably different from
> > the new Persistence SPI mentioned above since the concern here is not
> > persistence per se, but the logic of finding the right storage config.
> >
> > [1] https://lists.apache.org/thread/68r3gcx70f0qhbtz3w4zhb8f9s4vvw1f
> >
> > Cheers,
> > Dmitri.
> >
> > On Mon, Feb 2, 2026 at 4:18 PM Srinivas Rishindra <
> [email protected]>
> > wrote:
> >
> > > Hi all,
> > >
> > > We had an opportunity to discuss the community sprint last week. Based
> on
> > > that discussion, I have created a new design doc which I am attaching
> > here.
> > > In this design instead of passing credentials via table properties,
> this
> > > design introduces Inheritable Storage Configurations as a first-class
> > > feature. Please let me know your thoughts on the document.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1hbDkE-w84Pn_112iW2vCnlDKPDtyg8flaYcFGjvD120/edit?usp=sharing
> > >
> > >
> > > On Mon, Jan 26, 2026 at 10:42 PM Yufei Gu <[email protected]>
> wrote:
> > >
> > > > Hi Srinivas,
> > > >
> > > > Thanks for sharing this proposal. Persisting long lived credentials
> > such
> > > as
> > > > an S3 secret access key directly in table properties raises
> significant
> > > > security concerns. Here is an alternative approach previously
> > discussed,
> > > > which enables storage configuration at the table or namespace level,
> > and
> > > it
> > > > is probably a more secure and promising direction overall.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Mon, Jan 26, 2026 at 8:18 PM Srinivas Rishindra <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Dear All,
> > > > >
> > > > > I have developed a design proposal for Table-Level Storage
> Credential
> > > > > Overrides in Apache Polaris.
> > > > >
> > > > > The core objective is to allow specific storage properties to be
> > > defined
> > > > at
> > > > > the table level rather than the catalog level, enabling a single
> > > logical
> > > > > catalog to support tables across disparate storage systems.
> > Crucially,
> > > > the
> > > > > implementation ensures these overrides participate in the
> credential
> > > > > vending process to maintain secure, scoped access.
> > > > >
> > > > > I have also implemented a Proof of Concept (POC) pull request to
> > > > > demonstrate the idea. While the current MVP focuses on S3, I intend
> > to
> > > > > expand scope to include Azure and GCS pending community feedback.
> > > > >
> > > > > I look forward to your thoughts and suggestions on this proposal.
> > > > >
> > > > > Links:
> > > > >
> > > > > - Design Doc: Table-Level Storage Credential Overrides (
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tf4N8GKeyAAYNoP0FQ1zT1Ba3P1nVGgdw3nmnhSm-u0/edit?usp=sharing
> > > > > )
> > > > > - POC PR: https://github.com/apache/polaris/pull/3563 (
> > > > > https://github.com/apache/polaris/pull/3563)
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Srinivas Rishindra Pothireddi
> > > > >
> > > >
> > >
> >
>

Reply via email to