XJDKC commented on code in PR #2523:
URL: https://github.com/apache/polaris/pull/2523#discussion_r2383750881


##########
polaris-core/src/main/java/org/apache/polaris/core/identity/registry/ServiceIdentityRegistry.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.polaris.core.identity.registry;
+
+import java.util.Optional;
+import org.apache.polaris.core.identity.ServiceIdentityType;
+import org.apache.polaris.core.identity.dpo.ServiceIdentityInfoDpo;
+import org.apache.polaris.core.identity.resolved.ResolvedServiceIdentity;
+
+/**
+ * A registry interface for managing and resolving service identities in 
Polaris.
+ *
+ * <p>In a multi-tenant Polaris deployment, each catalog or tenant may be 
associated with a distinct
+ * service identity that represents the Polaris service itself when accessing 
external systems
+ * (e.g., cloud services like AWS or GCP). This registry provides a central 
mechanism to manage
+ * those identities and resolve them at runtime.
+ *
+ * <p>The registry helps abstract the configuration and retrieval of 
service-managed credentials
+ * from the logic that uses them. It ensures a consistent and secure way to 
handle identity
+ * resolution across different deployment models, including SaaS and 
self-managed environments.
+ */
+public interface ServiceIdentityRegistry {
+  /**
+   * Discover a new {@link ServiceIdentityInfoDpo} for the given service 
identity type. Typically
+   * used during entity creation to associate a default or generated identity.
+   *
+   * @param serviceIdentityType The type of service identity (e.g., AWS_IAM).
+   * @return A new {@link ServiceIdentityInfoDpo} representing the discovered 
service identity.
+   */
+  Optional<ServiceIdentityInfoDpo> discoverServiceIdentity(ServiceIdentityType 
serviceIdentityType);
+
+  /**
+   * Resolves the given service identity by retrieving the actual credential 
or secret referenced by
+   * it, typically from a secret manager or internal credential store.
+   *
+   * @param serviceIdentityInfo The service identity metadata to resolve.
+   * @return A {@link ResolvedServiceIdentity} including credentials and other 
resolved data.
+   */
+  Optional<ResolvedServiceIdentity> resolveServiceIdentity(
+      ServiceIdentityInfoDpo serviceIdentityInfo);

Review Comment:
   For the name, I think using a verb to explicitly express the intention of 
`allocating` a service identity is good in general. So the final version on my 
side of this method is:
   ```java
   interface ServiceIdentityProvider {
   Optional<ServiceIdentityInfoDpo> 
allocateServiceIdentity(ConnectionConfigInfo connectionConfig);
   }
   ```
   
   > Is it necessary to couple the "identity" (e.g. role ARN) with credentials?
   
   Not strictly, but in practice we usually configure them together. I agree 
there are cases where we just want to expose the service identity (e.g. in the 
loadCatalog response) without fetching creds, but since we cache service 
identity, the overhead isn’t significant.
   
   > I can easily imagine deployments where the Polaris service identity is 
configured statically, but credentials are acquired based on "workload 
identity".
   
   I assume we are talking about the aws credential chain: 
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials-chain.html#credentials-default.
 The current implementation also supports this case, if we just want to rely on 
the cloud provider's sdk to infer the credentials, all you need is just 
configure the `user arn` in the `application.properties` file without 
specifying the actual aws credentials.
   Then `ResolvedServiceIdentity` can contain nothing but just a class like 
`DefaultAwsCredentialsProvider`, so it can infer the credentials based on the 
workload identity (`Amazon ECS container credentials`).
   
   For in-house / testing, we don't need to specify / configure anything, and 
it should just work. For real prod, we need to explicitly know which 
credentials we use, we should not relegate to `DefaultAwsCredentialsProvider`, 
that's why we don't recommend using `IMPLICIT` auth type.
   
   If we want to decouple them, we need **another class** to hold the `role 
arn` only, we already have many classes for service identity, don't want to 
make it more complicated. I can't come up with a clean solution for it without 
including the `role arn` to in the `ResolvedAwsIamServiceIdentity` or we add a 
new class to hold this info.
   
   The architecture is pretty clear now:
   1. `ServiceIdentityInfoModel`: generate based on open-api spec, it's an user 
facing class, used to tell users the identity polaris uses to access their 
resources (storage / catalog)
   2. `ServiceIdentityInfoDpo`: holds the reference to the service identity: 
will be persisted in metastore.
   3. `ResolvedServiceIdentity`: contains all the info of a resolved service 
identity, it can be converted to `ServiceIdentityInfoModel` to surface service 
identity info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to