XJDKC commented on code in PR #2523: URL: https://github.com/apache/polaris/pull/2523#discussion_r2383513025
########## polaris-core/src/main/java/org/apache/polaris/core/identity/registry/ServiceIdentityRegistry.java: ########## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.polaris.core.identity.registry; + +import java.util.Optional; +import org.apache.polaris.core.identity.ServiceIdentityType; +import org.apache.polaris.core.identity.dpo.ServiceIdentityInfoDpo; +import org.apache.polaris.core.identity.resolved.ResolvedServiceIdentity; + +/** + * A registry interface for managing and resolving service identities in Polaris. + * + * <p>In a multi-tenant Polaris deployment, each catalog or tenant may be associated with a distinct + * service identity that represents the Polaris service itself when accessing external systems + * (e.g., cloud services like AWS or GCP). This registry provides a central mechanism to manage + * those identities and resolve them at runtime. + * + * <p>The registry helps abstract the configuration and retrieval of service-managed credentials + * from the logic that uses them. It ensures a consistent and secure way to handle identity + * resolution across different deployment models, including SaaS and self-managed environments. + */ +public interface ServiceIdentityRegistry { + /** + * Discover a new {@link ServiceIdentityInfoDpo} for the given service identity type. Typically + * used during entity creation to associate a default or generated identity. + * + * @param serviceIdentityType The type of service identity (e.g., AWS_IAM). + * @return A new {@link ServiceIdentityInfoDpo} representing the discovered service identity. + */ + Optional<ServiceIdentityInfoDpo> discoverServiceIdentity(ServiceIdentityType serviceIdentityType); + + /** + * Resolves the given service identity by retrieving the actual credential or secret referenced by + * it, typically from a secret manager or internal credential store. + * + * @param serviceIdentityInfo The service identity metadata to resolve. + * @return A {@link ResolvedServiceIdentity} including credentials and other resolved data. + */ + Optional<ResolvedServiceIdentity> resolveServiceIdentity( + ServiceIdentityInfoDpo serviceIdentityInfo); Review Comment: I'm okay with any name changes, also mentioned in the PR desc cuz I know `ServiceIdentityFactory` probably it's not a good name :). For `discoverServiceIdentity`, previously we use `assignServiceIdentity` which is almost the same as the `allocateServiceIdentity`. > 1. rename discoverServiceIdentity() to allocateServiceIdentity(ConnectionConfigInfo) Is there a reason we'd want to pass in `ConnectionConfigInfo` specifically? The long-term goal is to use the same interface for assigning service identity across both storage config and connection config, so I'd like to keep it generic if possible. But If we want to pass in the `ConnectionConfigInfo`, we can just return a `ConnectionConfigInfo` with the injected service identity directly. I'm okay to either option. > 2. update javadoc to clarify that different (or same) ServiceIdentityInfoDpo data may be produced in each call. I have included it in the class level javadoc, but I can add more for this interface specifically. ``` <p>In a multi-tenant Polaris deployment, each catalog or tenant may be associated with a distinct * service identity that represents the Polaris service itself when accessing external systems * (e.g., cloud services like AWS or GCP). This registry provides a central mechanism to manage * those identities and resolve them at runtime. ``` > 3. rename ServiceIdentityRegistry to ServiceIdentityFactory (just for clarity) Totally open to better naming suggestions here. Personally, I prefer to use `ServiceIdentityProvider`, WDYT? > 4. move resolveServiceIdentity() to ServiceCredentialsResolver.obtainServiceCredentials() (new interface) > 5. make obtainServiceCredentials(ServiceIdentityInfoDpo) return EnumMap<ConnectionCredentialProperty, String> (or a simple wrapper object for it). I don’t think this separation works well. As I mentioned in the [comment](https://github.com/apache/polaris/pull/2523#discussion_r2380608969), `ServiceIdentityRegistry` should own both allocation and resolution of service identities. The main duty of it is to interact with remote secret manager to get the service identity info and its credential. `PolarisCredentialManager` is the class responsible for obtaining temporary credentials (by assuming user-provided roles), so `PolarisCredentialManager` will contain many cloud specific logic, but it will return a creds map for abstraction, we can also use a single cache to store both the connection creds and the storage creds. Mixing those responsibilities across the two classes would blur the boundaries. Long-term, the split I'm envisioning is: * `ServiceIdentityRegistry`: manage allocation & resolution of service identities. * `PolarisCredentialManager`: handle fetching temp credentials to access remote catalogs (and storage) > a) Detach service "identity" from credentials to allow implementations that have a roleArn in the "identity" but produce credentials via custom code without directly referencing secrets (the SecretReference would be empty or "internal"). obtainServiceCredentials() would work based on runtime context (e.g. workload identity). The `ResolvedServiceIdentity` could contain empty credential and delegate it to `PolarisCredentialManager` to meet this use case right? > Allow callers of obtainServiceCredentials() to simply propagate the config to the connections. Different connection types would be handled inside obtainServiceCredentials(), e.g. by looking up a handler class by ServiceIdentityType. The `PolarisCredentialManager` will be the central class to handle this (for different external service and different cloud provider) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
