Hi Everyone, We had an opportunity to discuss this feature and my recent proposal at the last community sync meeting. I would like to summarize our discussion and enumerate the various options we considered to help us reach a consensus.
To recap, storage configuration is currently restricted at the catalog level. This limits flexibility for users who need to organize tables across different storage configurations or cloud providers within a single catalog. There appears to be general agreement on the utility of this feature; however, we still need to align on the specific implementation approach. Here are the various options that were considered. *Option 0: Make Credentials available as part of table properties. *(This was my original proposal, but abandoned after becoming aware of the security implications.) *Option 1: First-Class Storage Configuration Entity * This approach proposes elevating StorageConfiguration to a standalone, top-level resource in the Polaris backend (similar to a Principal, Namespace or Table), independent of the Catalog or Table. This is the approach in my most recent proposal doc. - Data Model: A new StorageConfiguration entity is created with its own unique identifier and lifecycle. Tables and Namespaces would store a reference ID pointing to this entity rather than embedding the credentials directly. - Security: This model offers the cleanest security boundary. We can introduce a specific USAGE privilege on the configuration entity. A user would need both CREATE_TABLE on the Namespace *and* USAGE on the specific StorageConfiguration to link them. - Credential Rotation: Highly efficient. Because the configuration is referenced by ID, rotating a cloud IAM role or secret requires updating only the single StorageConfiguration entity. All thousands of downstream tables referencing it would immediately use the new credentials without metadata updates. - Inheritance: The reference could be set at the Catalog, Namespace, or Table level. If a Table does not specify a reference, it would inherit the reference from its parent Namespace (and so on), preserving the current hierarchical behavior while adding granularity. • Pros: Maximum flexibility and reusability (Many-to-Many). Updating one config object propagates to all associated tables. - • Cons: Highest engineering cost. Requires new CRUD APIs, DB schema changes (mapping tables), and complex authorization logic (two-stage auth checks). Risk of accumulating "orphaned" configs Option 2: The "Embedded Field" Model - This approach extends the existing Table and Namespace entities to include a storageConfig field. The parameter can be defaulted to 'null' and use parent's storageConfig at runtime. *Data Model:* No new top-level entity is created. The storage details (e.g., roleArn) are stored directly into a new, dedicated column or structure within the existing Table/Namespace entity. Complexity: This could reduce the engineering overhead significantly. There are no new CRUD endpoints for configuration objects, no referential integrity checks (e.g., preventing the deletion of a config used by active tables). Credential Rotation: Credential rotation is difficult. If an IAM role changes, an administrator must identify and issue UPDATE operations for every individual table or namespace that uses that specific configuration, potentially affecting thousands of objects. • Pros: Lowest engineering cost. No new entities or complex mappings are required. Easy to reason about authorization (auth is tied strictly to the entity). • Cons: No reusability. Configs must be duplicated across tables; rotating credentials for 1,000 tables could require 1,000 update calls. Option 3: Named Catalog-Level Configurations (Hybrid) This can be a combination of Option1 and Option 2 Admin can define a registry of "Named Storage Configurations" stored within the Catalog. Sub-entities (Namespaces/Tables) reference these configs by name (e.g., storage-config: "finance-secure-role"). *Data Model:* No separate top level entity is created. The Catalog Entity potentially needs to be modified to accommodate named storage configurations. Credential Rotation: Credential Rotation can be done at the catalog level for each named Storage Configuration. Inheritance: Works pretty much similar as proposed in option 1 & option2. Security: Not as secure as option1 but still useful. A principal with proper access can attach any named storage configuration defined at the catalog level to any arbitrary entity within the catalog. • Pros: Good balance of reusability and simplicity. Allows updating a config in one place (the Catalog definition) without needing a full-blown global entity system. • Cons: Scope is limited to the Catalog (cannot share configs across catalogs) Option 4: Leverage Existing Policy Framework This approach leverages the existing Apache Polaris Policy Framework (currently used for features like snapshot expiry) to manage storage settings. Data Model: Storage configurations are defined as "Policies" at the Catalog level. These Policies contain the credential details and can be attached to Namespaces or Tables using the existing policy attachment APIs. Inheritance: This aligns naturally with Polaris's existing architecture, where policies cascade from Catalog → Namespace → Table. The vending logic would simply resolve the "effective" storage policy for a table at query time. Security: This utilizes the existing Polaris Privileges and attachment privileges. Administrators can define authorized storage policies centrally, and users can only select from these pre-approved policies, preventing them from inputting arbitrary or insecure role ARNs. • Pros: . Zero New Infrastructure: Reuses the existing "Policy" entity, persistence layer, and inheritance logic, significantly reducing engineering effort . Proven Inheritance: The logic for resolving policies from child to parent is already implemented and tested • Cons: . Semantic Confusion: Policies are typically used for "governance rules" (e.g., snapshot expiry, compaction) rather than "connectivity configuration." Using them for credentials might be unintuitive . Authorization Complexity: The authorizer would need to load and evaluate policies to determine how to access data, potentially coupling governance logic with data access paths We can potentially start with one of the options initially and as the feature and user needs develop we can migrate to other options as well. Please let me know your thoughts about the various options above or if on anything that I might have missed so that we can work towards a consensus on how to implement this feature. On Thu, Feb 5, 2026 at 8:08 AM Tornike Gurgenidze <[email protected]> wrote: > Hi, > > To follow up on Dmitri's point about credentials, there's already a PR > <https://github.com/apache/polaris/pull/3409> up that is going to allow > predefining named storage credentials in polaris config like the following: > > - polaris.storage.aws.<storage-name>.access-key > - polaris.storage.aws.<storage-name>.secret-key > > then storage configuration will simply refer to it by name and > inherit credentials. > > I think that can go hand in hand with table-level overrides. Overriding > each and every aws property for every table doesn't sound ideal. Defining a > storage configuration upfront and referring to it by name should be a > simpler solution. I can extend the scope of the PR above to allow > predefining other aws properties as well like endpoint-url and region. > > Another point that came up in the discussion surrounding extra credentials > is how to make sure anyone can't just hijack pre configured credentials. > The simplest solution I see there is to ship off properties to OPA during > catalog (and table) creation and allow users to write policies based on > them. If we want to enable internal rbac to have a similar capability we > can go further and move from config based storage definition to a separate > `/storage-config` rest resource in management API that will come with > necessary grants and permissions. > > On Thu, Feb 5, 2026 at 5:43 AM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi Srinivas, > > > > Thanks for the proposal. It looks good to me overall, a very timely > feature > > to add to Polaris. > > > > I added some comments in the doc and I see this topic on the Community > Sync > > agenda for Feb 5. Looking forward to discussing it online. > > > > I have three points to highlight: > > > > * Dealing with passwords probably connects to the Secrets Manager > > discussion [1] > > > > * Persistence needs to consider non-RDBMS backends. OSS code has both > > PostgreSQL and MongoDB, but private Persistence implementations are > > possible too. I believe we need a proper SPI for this, not just a > > relational schema example. > > > > * Associating entities (tables, namespaces) to Storage Configuration is > > likely a plugin point that downstream projects may want to customize. I'd > > propose making another SPI for this. This SPI is probably different from > > the new Persistence SPI mentioned above since the concern here is not > > persistence per se, but the logic of finding the right storage config. > > > > [1] https://lists.apache.org/thread/68r3gcx70f0qhbtz3w4zhb8f9s4vvw1f > > > > Cheers, > > Dmitri. > > > > On Mon, Feb 2, 2026 at 4:18 PM Srinivas Rishindra < > [email protected]> > > wrote: > > > > > Hi all, > > > > > > We had an opportunity to discuss the community sprint last week. Based > on > > > that discussion, I have created a new design doc which I am attaching > > here. > > > In this design instead of passing credentials via table properties, > this > > > design introduces Inheritable Storage Configurations as a first-class > > > feature. Please let me know your thoughts on the document. > > > > > > > > > > > > https://docs.google.com/document/d/1hbDkE-w84Pn_112iW2vCnlDKPDtyg8flaYcFGjvD120/edit?usp=sharing > > > > > > > > > On Mon, Jan 26, 2026 at 10:42 PM Yufei Gu <[email protected]> > wrote: > > > > > > > Hi Srinivas, > > > > > > > > Thanks for sharing this proposal. Persisting long lived credentials > > such > > > as > > > > an S3 secret access key directly in table properties raises > significant > > > > security concerns. Here is an alternative approach previously > > discussed, > > > > which enables storage configuration at the table or namespace level, > > and > > > it > > > > is probably a more secure and promising direction overall. > > > > > > > > Yufei > > > > > > > > > > > > On Mon, Jan 26, 2026 at 8:18 PM Srinivas Rishindra < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > Dear All, > > > > > > > > > > I have developed a design proposal for Table-Level Storage > Credential > > > > > Overrides in Apache Polaris. > > > > > > > > > > The core objective is to allow specific storage properties to be > > > defined > > > > at > > > > > the table level rather than the catalog level, enabling a single > > > logical > > > > > catalog to support tables across disparate storage systems. > > Crucially, > > > > the > > > > > implementation ensures these overrides participate in the > credential > > > > > vending process to maintain secure, scoped access. > > > > > > > > > > I have also implemented a Proof of Concept (POC) pull request to > > > > > demonstrate the idea. While the current MVP focuses on S3, I intend > > to > > > > > expand scope to include Azure and GCS pending community feedback. > > > > > > > > > > I look forward to your thoughts and suggestions on this proposal. > > > > > > > > > > Links: > > > > > > > > > > - Design Doc: Table-Level Storage Credential Overrides ( > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tf4N8GKeyAAYNoP0FQ1zT1Ba3P1nVGgdw3nmnhSm-u0/edit?usp=sharing > > > > > ) > > > > > - POC PR: https://github.com/apache/polaris/pull/3563 ( > > > > > https://github.com/apache/polaris/pull/3563) > > > > > > > > > > Best regards, > > > > > > > > > > Srinivas Rishindra Pothireddi > > > > > > > > > > > > > > >
