dimas-b commented on code in PR #2523:
URL: https://github.com/apache/polaris/pull/2523#discussion_r2383242264


##########
polaris-core/src/main/java/org/apache/polaris/core/identity/registry/ServiceIdentityRegistry.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.polaris.core.identity.registry;
+
+import java.util.Optional;
+import org.apache.polaris.core.identity.ServiceIdentityType;
+import org.apache.polaris.core.identity.dpo.ServiceIdentityInfoDpo;
+import org.apache.polaris.core.identity.resolved.ResolvedServiceIdentity;
+
+/**
+ * A registry interface for managing and resolving service identities in 
Polaris.
+ *
+ * <p>In a multi-tenant Polaris deployment, each catalog or tenant may be 
associated with a distinct
+ * service identity that represents the Polaris service itself when accessing 
external systems
+ * (e.g., cloud services like AWS or GCP). This registry provides a central 
mechanism to manage
+ * those identities and resolve them at runtime.
+ *
+ * <p>The registry helps abstract the configuration and retrieval of 
service-managed credentials
+ * from the logic that uses them. It ensures a consistent and secure way to 
handle identity
+ * resolution across different deployment models, including SaaS and 
self-managed environments.
+ */
+public interface ServiceIdentityRegistry {
+  /**
+   * Discover a new {@link ServiceIdentityInfoDpo} for the given service 
identity type. Typically
+   * used during entity creation to associate a default or generated identity.
+   *
+   * @param serviceIdentityType The type of service identity (e.g., AWS_IAM).
+   * @return A new {@link ServiceIdentityInfoDpo} representing the discovered 
service identity.
+   */
+  Optional<ServiceIdentityInfoDpo> discoverServiceIdentity(ServiceIdentityType 
serviceIdentityType);
+
+  /**
+   * Resolves the given service identity by retrieving the actual credential 
or secret referenced by
+   * it, typically from a secret manager or internal credential store.
+   *
+   * @param serviceIdentityInfo The service identity metadata to resolve.
+   * @return A {@link ResolvedServiceIdentity} including credentials and other 
resolved data.
+   */
+  Optional<ResolvedServiceIdentity> resolveServiceIdentity(
+      ServiceIdentityInfoDpo serviceIdentityInfo);

Review Comment:
   > [...] Then we can return a different service identity per catalog per 
realm.
   
   I can see that it is technically possible, but I'm not sure it is reasonable 
:sweat_smile: `ServiceIdentityInfoDpo` is a function of `ServiceIdentityType`, 
so returning logically different values from consecutive registry calls would 
be counter-intuitive IMHO, and hard to debug should issues arise.
   
   I was actually going to comment about handling service identities per 
catalog before, but I refrained assuming you wanted realm-scoped service 
identities :sweat_smile: 
   
   Thanks for bearing with me during this review. I think I have a better 
understanding of your intended impl. for this feature now (which I did not have 
initially), and I hope my refactoring suggestions (below) make sense.
   
   Currently `discoverServiceIdentity()` is called during catalog creation, so 
the catalog is not available as an "entity" at the time of this call.
   
   In order to make per-catalog service identities easier to reason about I'd 
like to propose:
   1) rename `discoverServiceIdentity()` to 
`allocateServiceIdentity(ConnectionConfigInfo)`
   2) update javadoc to clarify that different (or same) 
`ServiceIdentityInfoDpo` data may be produced in each call.
   3) rename `ServiceIdentityRegistry` to `ServiceIdentityFactory` (just for 
clarity)
   4) move `resolveServiceIdentity()` to 
`ServiceCredentialsResolver.obtainServiceCredentials()` (new interface)
   5) make `obtainServiceCredentials(ServiceIdentityInfoDpo)` return 
`EnumMap<ConnectionCredentialProperty, String>` (or a simple wrapper object for 
it).
   
   From my personal POV, I tend to put more emphasis on class/method names used 
in the code and consider javadoc secondary because class/method names are more 
readily visible to developers while coding. So, my comments here may be a bit 
nit-picky :sweat_smile: 
   
   The ideas behind this refactoring:
   a) Detach service "identity" from credentials to allow implementations that 
have a `roleArn` in the "identity" but produce credentials via custom code 
without directly referencing secrets (the `SecretReference` would be empty or 
"internal"). `obtainServiceCredentials()` would work based on runtime context 
(e.g. workload identity).
   b) Allow callers of `obtainServiceCredentials()` to simply propagate the 
config to the connections. Different connection types would be handled inside 
`obtainServiceCredentials()`, e.g. by looking up a handler class by 
`ServiceIdentityType`.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to