In general this sigv4 indirection control-flow should mirror the analogous patterns we apply on the StorageConfigInfo side (and perhaps long-term we can better consolidate the STS logic for the two), so I'd agree it's not even necessarily federation-specific.
There's some precedent for the use-case of a "self-run Polaris" user wanting to just use simple server-wide configuration for StorageConfigInfo already: SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION https://github.com/apache/polaris/blob/4db7998381a61e9cab82cdc4fded6867b0bca464/service/common/src/main/java/org/apache/polaris/service/catalog/io/FileIOUtil.java#L92 For this Catalog Federation sigv4 case we could introduce a similar feature configuration; whether or not this feature configuration is the exact way we want to do it long-term, it would make sense to refactor both the ConnectionConfig and StorageConfig parts together in the future. One important concept for this simple approach is that instead of getting into the business of having Polaris actually try to juggle long-lived credentials for IAM Users explicitly, this "simple case" can just inherit "environment-provided" credentials and let low-level SDK libraries use their default "credential chain" logic. So basically we'd have 2 modes of running Polaris: 1. Secure multi-tenant - Polaris will have opinionated/constrained scaffolding via layers of credential indirection, subscoping, secrets-management, etc. 2. Single-tenant - Polaris will be more hands-off in terms of secrets management, instead allowing thick clients to use typical "environment-provided" credentials (e.g. environment variables, EC2 instance-metadata endpoint, local credential files, etc) On Fri, May 2, 2025 at 4:28 PM Dmitri Bourlatchkov <di...@apache.org> wrote: > I think this discussion moves slightly out of the scope of catalog > federation and into handling secrets :) ... but the points you're making > are quite valid. > > Let's keep them in mind when we reopen the secrets handling discussion. > > Cheers, > Dmitri. > > On Fri, May 2, 2025 at 7:04 PM Rulin Xing <ru...@apache.org> wrote: > > > Hi Dmitri, > > > > Totally agree that we need to recognize the self-managed deployment case > > as a first-class scenario. That means we should provide a way to > configure > > Polaris with long-lived credentials. > > > > I see a couple of options for supporting this: > > 1. From env vars or server config, e.g.: > > * POLARIS_IAM_USER_AWS_ACCESS_KEY_ID > > * POLARIS_IAM_USER_AWS_SECRET_ACCESS_KEY > > * POLARIS_IAM_USER_ARN > > In this case, `roleArn` would not be required. > > > > 2. Configured via the Polaris Management API: Stick to > > `SigV4AuthenticationParameters` > > > > If we stick with the existing `SigV4AuthenticationParameters` type, we > > could: > > * Make roleArn optional > > * Add `iamUserAwsAccessKeyId` and `iamUserAwsSecretAccessKey` as optional > > fields > > > > 3. Configured via the Polaris Management API: Add new auth type > > > > We could create a new type to distinguish clearly: > > * New AuthenticationType enum: SIGV4_STS, SIGV4_STATIC_CREDS > > > > 4. Configured via the Polaris Management API: Add new auth types > > > > We could create a new sub type to distinguish clearly: > > e.g. new subtype under SigV4AuthenticationParameters: STS, CREDS > > > > Personally, I would prefer option 4. WDYT? > > > > I'll include these options in my PR as well for discussion. > > > > Best, > > Rulin > > > > > > On 2025/05/02 17:16:44 Dmitri Bourlatchkov wrote: > > > Thanks for your message, Rulin! You made good points and I agree with > > them. > > > > > > I'm planning to introduce a `PolarisConnectionCredentialVendor` > > > > > > > > > Looking forward to this proposal! > > > > > > > > > The goal is to draw a clear boundary between user-provided input and > > > Polaris-generated service info [...] > > > > > > > > > I support this goal, however, I'd like to emphasise that there may be > > some > > > skew in different deployment models. > > > > > > Traditionally Polaris was envisioned as a service running for multiple > > > users from distinct organisations, I guess. However, when Apache > Polaris > > > releases binary artifacts users will be able to run their own > > deployments. > > > In that situation, the boundary between what is configured at the > > > deployment level and what is configured via the Polaris Management API > > may > > > not be as sharp. > > > > > > I believe we need to recognise the self-managed deployment case and > > > consider it as a mainstream case. I'm sure we're going to have some > real > > > users behind this use case soon. > > > > > > Specifically for the SigV4 authentication option in Federated > Catalogs, I > > > guess this means that users may want to use simpler key/secret pairs as > > > input for secure connections to AWS services like Glue. In self-managed > > > deployments this is not a security risk, from my POV. > > > > > > Would you consider it as a possible future enhancement? > > > > > > If yes, do you think it would fall under the proposed > > > SigV4AuthenticationParameters > > > (as a set of new optional attributes perhaps)?.. or maybe be a > different > > > config type altogether? (this is related to my GH comment about type > > names, > > > but the problem is bigger than just naming, I think). > > > > > > I do not question that the STS / assume role path offers better > security > > > guarantees. My point is that it may still be valuable for OSS users to > > have > > > simpler connection options. > > > > > > Thanks, > > > Dmitri. > > > > > > On Thu, May 1, 2025 at 9:54 PM Rulin Xing <ru...@apache.org> wrote: > > > > > > > Hi Dmitri, > > > > > > > > Thanks for the thoughtful questions! > > > > > > > > 1. Does this assume the use of STS? > > > > > > > > Yes, the current spec changes assume the use of STS. Polaris acts as > a > > > > service provider and assumes IAM roles provided by users to access > AWS > > > > resources like Glue Catalogs. This model avoids long-lived > credentials > > and > > > > enables secure, temporary access via STS-issued credentials. > > > > > > > > 2. Why is plain key/secret SigV4 not an option? > > > > > > > > We can support plain key/secret credentials for SigV4, particularly > in > > > > self-managed deployments where users own both the Polaris deployment > > and > > > > AWS accounts. However, to reduce security risks, we don't want to > store > > > > long-lived credentials directly in the catalog entity. A more secure > > > > approach is to reference them using `UserSecretReference` (added by > > > > @dennishuo) and retrieve them through `UserSecretsManager`. > > > > > > > > 3. Where is Polaris expected to get credentials for STS requests? > > > > > > > > Polaris obtains credentials for STS calls from its own runtime > > > > environment, such as server config, environment variables, or > > cloud-native > > > > options like instance profiles. These are used to call AssumeRole on > > the > > > > user-provided IAM role. > > > > > > > > To support both temporary and static credential workflows, I'm > > planning to > > > > introduce a `PolarisConnectionCredentialVendor` (or > > > > `PolarisCredentialManager`) interface. This class will: > > > > * Provide Polaris-generated service info (what we call vendor info) > > such > > > > as `userArn`, `externalId`, , `consentUrl`, or `gcsServiceAccount`, > > which > > > > will be injected into the catalog entity's connection config / > storage > > > > config. This info is exposed to users when they load the catalog > > entity and > > > > is needed for setting up the appropriate permissions (e.g., allowing > > > > Polaris to assume roles). > > > > * Retrieve temporary credentials from cloud providers (e.g., AWS STS, > > > > Azure identity services) when needed to perform authenticated > > operations. > > > > > > > > The goal is to draw a clear boundary between user-provided input and > > > > Polaris-generated service info (something that's currently unclear in > > > > storage configs). In the long term, we're aiming to unify both > > connection > > > > and storage credential handling in this interface to simplify the > > overall > > > > architecture and improve security. > > > > > > > > Best, > > > > Rulin > > > > > > > > On 2025/05/01 22:02:32 Dmitri Bourlatchkov wrote: > > > > > Hi Rulin, > > > > > > > > > > Thanks for the informative description in the PR! > > > > > > > > > > It looks like the authentication method relies on STS. As such it > is > > a > > > > > sub-case of SigV4, I believe, because SigV4 can be used with plain > > > > > key/secret credentials without assuming a role. > > > > > > > > > > If that is so, could you clarify that in the description? > > > > > > > > > > Is there any particular reason for not supporting plain key/secret > > > > > credentials? > > > > > > > > > > When STS is in use, where is Polaris expected to get credentials > for > > STS > > > > > requests? > > > > > > > > > > Thanks, > > > > > Dmitri. > > > > > > > > > > On Thu, May 1, 2025 at 5:37 PM Rulin Xing <ru...@apache.org> > wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > Just wanted to surface a new API spec update proposal related to > > > > Catalog > > > > > > Federation: > > > > > > > > > > > > https://github.com/apache/polaris/pull/1506 > > > > > > > > > > > > This adds support for AWS SigV4 authentication, enabling Polaris > to > > > > > > federate to external Iceberg REST catalogs hosted behind services > > like > > > > AWS > > > > > > Glue, S3Tables, or API Gateway. > > > > > > > > > > > > It builds on earlier federation work and introduces a set of > > > > properties to > > > > > > support role assumption and request signing via SigV4. > > > > > > > > > > > > Feedback on the spec or implementation is welcome! > > > > > > > > > > > > Best, > > > > > > Rulin > > > > > > > > > > > > > > > > > > > > >