Hi Dmitri, I need the SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION option for my self-managed Polaris deployment. Unfortunately, due to company policy, I cannot use credential vending and must rely on environment variables to provide credentials. While I would prefer to use credential vending if it were allowed, I am forced to use environment variables in this case.
Best, CG > On 6 May 2025, at 06:11, Dmitri Bourlatchkov <di...@apache.org> wrote: > > SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION > > > I do not think that this option is a solution for self-managed deployments > at all. > > It effectively disables credential vending, which is still a valuable > feature for self-managed cases. > > So basically we'd have 2 modes of running Polaris [...] > > > I'd really like to avoid having "running modes" in the sense of having this > "mode" as a code-level config or flag. > > I believe configuration options should provide enough controls to the admin > user to make Polaris behave in a certain way, but I believe those configs > should apply to specific aspects of Polaris behaviour as opposed to > defining an overarching "mode". > > For example, subscoping for vended credentials is valuable, IMHO, even in > single-tenant deployments with a plain key/secret pair for authenticating > STS connections. > > Cheers, > Dmitri. > >> On Mon, May 5, 2025 at 2:36 PM Dennis Huo <huoi...@gmail.com> wrote: >> >> In general this sigv4 indirection control-flow should mirror the analogous >> patterns we apply on the StorageConfigInfo side (and perhaps long-term we >> can better consolidate the STS logic for the two), so I'd agree it's not >> even necessarily federation-specific. >> >> There's some precedent for the use-case of a "self-run Polaris" user >> wanting to just use simple server-wide configuration for StorageConfigInfo >> already: SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION >> >> >> https://github.com/apache/polaris/blob/4db7998381a61e9cab82cdc4fded6867b0bca464/service/common/src/main/java/org/apache/polaris/service/catalog/io/FileIOUtil.java#L92 >> >> For this Catalog Federation sigv4 case we could introduce a similar feature >> configuration; whether or not this feature configuration is the exact way >> we want to do it long-term, it would make sense to refactor both the >> ConnectionConfig and StorageConfig parts together in the future. >> >> One important concept for this simple approach is that instead of getting >> into the business of having Polaris actually try to juggle long-lived >> credentials for IAM Users explicitly, this "simple case" can just inherit >> "environment-provided" credentials and let low-level SDK libraries use >> their default "credential chain" logic. >> >> So basically we'd have 2 modes of running Polaris: >> >> 1. Secure multi-tenant - Polaris will have opinionated/constrained >> scaffolding via layers of credential indirection, subscoping, >> secrets-management, etc. >> 2. Single-tenant - Polaris will be more hands-off in terms of secrets >> management, instead allowing thick clients to use typical >> "environment-provided" credentials (e.g. environment variables, EC2 >> instance-metadata endpoint, local credential files, etc) >> >> On Fri, May 2, 2025 at 4:28 PM Dmitri Bourlatchkov <di...@apache.org> >> wrote: >> >>> I think this discussion moves slightly out of the scope of catalog >>> federation and into handling secrets :) ... but the points you're making >>> are quite valid. >>> >>> Let's keep them in mind when we reopen the secrets handling discussion. >>> >>> Cheers, >>> Dmitri. >>> >>>> On Fri, May 2, 2025 at 7:04 PM Rulin Xing <ru...@apache.org> wrote: >>> >>>> Hi Dmitri, >>>> >>>> Totally agree that we need to recognize the self-managed deployment >> case >>>> as a first-class scenario. That means we should provide a way to >>> configure >>>> Polaris with long-lived credentials. >>>> >>>> I see a couple of options for supporting this: >>>> 1. From env vars or server config, e.g.: >>>> * POLARIS_IAM_USER_AWS_ACCESS_KEY_ID >>>> * POLARIS_IAM_USER_AWS_SECRET_ACCESS_KEY >>>> * POLARIS_IAM_USER_ARN >>>> In this case, `roleArn` would not be required. >>>> >>>> 2. Configured via the Polaris Management API: Stick to >>>> `SigV4AuthenticationParameters` >>>> >>>> If we stick with the existing `SigV4AuthenticationParameters` type, we >>>> could: >>>> * Make roleArn optional >>>> * Add `iamUserAwsAccessKeyId` and `iamUserAwsSecretAccessKey` as >> optional >>>> fields >>>> >>>> 3. Configured via the Polaris Management API: Add new auth type >>>> >>>> We could create a new type to distinguish clearly: >>>> * New AuthenticationType enum: SIGV4_STS, SIGV4_STATIC_CREDS >>>> >>>> 4. Configured via the Polaris Management API: Add new auth types >>>> >>>> We could create a new sub type to distinguish clearly: >>>> e.g. new subtype under SigV4AuthenticationParameters: STS, CREDS >>>> >>>> Personally, I would prefer option 4. WDYT? >>>> >>>> I'll include these options in my PR as well for discussion. >>>> >>>> Best, >>>> Rulin >>>> >>>> >>>> On 2025/05/02 17:16:44 Dmitri Bourlatchkov wrote: >>>>> Thanks for your message, Rulin! You made good points and I agree with >>>> them. >>>>> >>>>> I'm planning to introduce a `PolarisConnectionCredentialVendor` >>>>> >>>>> >>>>> Looking forward to this proposal! >>>>> >>>>> >>>>> The goal is to draw a clear boundary between user-provided input and >>>>> Polaris-generated service info [...] >>>>> >>>>> >>>>> I support this goal, however, I'd like to emphasise that there may be >>>> some >>>>> skew in different deployment models. >>>>> >>>>> Traditionally Polaris was envisioned as a service running for >> multiple >>>>> users from distinct organisations, I guess. However, when Apache >>> Polaris >>>>> releases binary artifacts users will be able to run their own >>>> deployments. >>>>> In that situation, the boundary between what is configured at the >>>>> deployment level and what is configured via the Polaris Management >> API >>>> may >>>>> not be as sharp. >>>>> >>>>> I believe we need to recognise the self-managed deployment case and >>>>> consider it as a mainstream case. I'm sure we're going to have some >>> real >>>>> users behind this use case soon. >>>>> >>>>> Specifically for the SigV4 authentication option in Federated >>> Catalogs, I >>>>> guess this means that users may want to use simpler key/secret pairs >> as >>>>> input for secure connections to AWS services like Glue. In >> self-managed >>>>> deployments this is not a security risk, from my POV. >>>>> >>>>> Would you consider it as a possible future enhancement? >>>>> >>>>> If yes, do you think it would fall under the proposed >>>>> SigV4AuthenticationParameters >>>>> (as a set of new optional attributes perhaps)?.. or maybe be a >>> different >>>>> config type altogether? (this is related to my GH comment about type >>>> names, >>>>> but the problem is bigger than just naming, I think). >>>>> >>>>> I do not question that the STS / assume role path offers better >>> security >>>>> guarantees. My point is that it may still be valuable for OSS users >> to >>>> have >>>>> simpler connection options. >>>>> >>>>> Thanks, >>>>> Dmitri. >>>>> >>>>> On Thu, May 1, 2025 at 9:54 PM Rulin Xing <ru...@apache.org> wrote: >>>>> >>>>>> Hi Dmitri, >>>>>> >>>>>> Thanks for the thoughtful questions! >>>>>> >>>>>> 1. Does this assume the use of STS? >>>>>> >>>>>> Yes, the current spec changes assume the use of STS. Polaris acts >> as >>> a >>>>>> service provider and assumes IAM roles provided by users to access >>> AWS >>>>>> resources like Glue Catalogs. This model avoids long-lived >>> credentials >>>> and >>>>>> enables secure, temporary access via STS-issued credentials. >>>>>> >>>>>> 2. Why is plain key/secret SigV4 not an option? >>>>>> >>>>>> We can support plain key/secret credentials for SigV4, particularly >>> in >>>>>> self-managed deployments where users own both the Polaris >> deployment >>>> and >>>>>> AWS accounts. However, to reduce security risks, we don't want to >>> store >>>>>> long-lived credentials directly in the catalog entity. A more >> secure >>>>>> approach is to reference them using `UserSecretReference` (added by >>>>>> @dennishuo) and retrieve them through `UserSecretsManager`. >>>>>> >>>>>> 3. Where is Polaris expected to get credentials for STS requests? >>>>>> >>>>>> Polaris obtains credentials for STS calls from its own runtime >>>>>> environment, such as server config, environment variables, or >>>> cloud-native >>>>>> options like instance profiles. These are used to call AssumeRole >> on >>>> the >>>>>> user-provided IAM role. >>>>>> >>>>>> To support both temporary and static credential workflows, I'm >>>> planning to >>>>>> introduce a `PolarisConnectionCredentialVendor` (or >>>>>> `PolarisCredentialManager`) interface. This class will: >>>>>> * Provide Polaris-generated service info (what we call vendor info) >>>> such >>>>>> as `userArn`, `externalId`, , `consentUrl`, or `gcsServiceAccount`, >>>> which >>>>>> will be injected into the catalog entity's connection config / >>> storage >>>>>> config. This info is exposed to users when they load the catalog >>>> entity and >>>>>> is needed for setting up the appropriate permissions (e.g., >> allowing >>>>>> Polaris to assume roles). >>>>>> * Retrieve temporary credentials from cloud providers (e.g., AWS >> STS, >>>>>> Azure identity services) when needed to perform authenticated >>>> operations. >>>>>> >>>>>> The goal is to draw a clear boundary between user-provided input >> and >>>>>> Polaris-generated service info (something that's currently unclear >> in >>>>>> storage configs). In the long term, we're aiming to unify both >>>> connection >>>>>> and storage credential handling in this interface to simplify the >>>> overall >>>>>> architecture and improve security. >>>>>> >>>>>> Best, >>>>>> Rulin >>>>>> >>>>>> On 2025/05/01 22:02:32 Dmitri Bourlatchkov wrote: >>>>>>> Hi Rulin, >>>>>>> >>>>>>> Thanks for the informative description in the PR! >>>>>>> >>>>>>> It looks like the authentication method relies on STS. As such it >>> is >>>> a >>>>>>> sub-case of SigV4, I believe, because SigV4 can be used with >> plain >>>>>>> key/secret credentials without assuming a role. >>>>>>> >>>>>>> If that is so, could you clarify that in the description? >>>>>>> >>>>>>> Is there any particular reason for not supporting plain >> key/secret >>>>>>> credentials? >>>>>>> >>>>>>> When STS is in use, where is Polaris expected to get credentials >>> for >>>> STS >>>>>>> requests? >>>>>>> >>>>>>> Thanks, >>>>>>> Dmitri. >>>>>>> >>>>>>> On Thu, May 1, 2025 at 5:37 PM Rulin Xing <ru...@apache.org> >>> wrote: >>>>>>> >>>>>>>> Hi folks, >>>>>>>> >>>>>>>> Just wanted to surface a new API spec update proposal related >> to >>>>>> Catalog >>>>>>>> Federation: >>>>>>>> >>>>>>>> https://github.com/apache/polaris/pull/1506 >>>>>>>> >>>>>>>> This adds support for AWS SigV4 authentication, enabling >> Polaris >>> to >>>>>>>> federate to external Iceberg REST catalogs hosted behind >> services >>>> like >>>>>> AWS >>>>>>>> Glue, S3Tables, or API Gateway. >>>>>>>> >>>>>>>> It builds on earlier federation work and introduces a set of >>>>>> properties to >>>>>>>> support role assumption and request signing via SigV4. >>>>>>>> >>>>>>>> Feedback on the spec or implementation is welcome! >>>>>>>> >>>>>>>> Best, >>>>>>>> Rulin >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>