Re: [PR] API Spec: Add ConnectionConfigInfo to ExternalCatalog [polaris]

via GitHub Sat, 05 Apr 2025 10:01:40 -0700


dennishuo commented on code in PR #1026:
URL: https://github.com/apache/polaris/pull/1026#discussion_r2004463034



##########
spec/polaris-management-service.yml:
##########
@@ -850,9 +850,92 @@ components:
         - $ref: "#/components/schemas/Catalog"
         - type: object
           properties:
-            remoteUrl:
-              type: string
-              description: URL to the remote catalog API
+            connectionConfigInfo:
+              $ref: "#/components/schemas/ConnectionConfigInfo"
+
+    ConnectionConfigInfo:
+      type: object
+      description: A connection configuration representing a remote catalog 
service
+      properties:
+        connectionType:
+          type: string
+          enum:
+            - ICEBERG_REST
+          description: The type of remote catalog service represented by this 
connection
+        uri:
+          type: string
+          description: URI to the remote catalog service
+      required:
+        - connectionType
+      discriminator:
+        propertyName: connectionType
+        mapping:
+          ICEBERG_REST: "#/components/schemas/IcebergRestConnectionConfigInfo"
+
+    IcebergRestConnectionConfigInfo:
+      type: object
+      description: Configuration necessary for connecting to an Iceberg REST 
Catalog
+      allOf:
+        - $ref: '#/components/schemas/ConnectionConfigInfo'
+      properties:
+        remoteCatalogName:
+          type: string
+          description: The name of a remote catalog instance within the remote 
catalog service; in some older systems
+            this is specified as the 'warehouse' when multiple logical 
catalogs are served under the same base
+            uri, and often translates into a 'prefix' added to all REST 
resource paths
+        restAuthentication:
+          $ref: "#/components/schemas/AuthenticationParameters"
+
+    AuthenticationParameters:
+      type: object
+      description: Authentication-specific information for a REST connection
+      properties:
+        restAuthenticationType:
+          type: string
+          enum:
+            - OAUTH
+            - BEARER
+          description: The type of authentication to use when connecting to 
the remote rest service
+      required:
+        - restAuthenticationType
+      discriminator:
+        propertyName: restAuthenticationType
+        mapping:
+          OAUTH: "#/components/schemas/OAuthClientCredentialsParameters"
+          BEARER: "#/components/schemas/BearerAuthenticationParameters"
+
+    OAuthClientCredentialsParameters:
+      type: object
+      description: OAuth authentication based on client_id/client_secret
+      allOf:
+        - $ref: '#/components/schemas/AuthenticationParameters'
+      properties:
+        tokenUri:
+          type: string
+          description: Token server URI
+        clientId:
+          type: string
+          description: oauth client id
+        clientSecret:
+          type: string
+          format: password
+          description: oauth client secret (input-only)
+        scopes:
+          type: array
+          items:
+            type: string
+          description: oauth scopes to specify when exchanging for a 
short-lived access token
+
+    BearerAuthenticationParameters:
+      type: object
+      description: Bearer authentication directly embedded in request auth 
headers
+      allOf:
+        - $ref: '#/components/schemas/AuthenticationParameters'
+      properties:
+        bearerToken:

Review Comment:
   Yeah, the intent here is to support a variety of authentication models, this 
could be something we hash out more on the dev mailing list as well.
   
   I agree we could iterate quickly by marking the feature as alpha initially, 
as I suspect the end-to-end interaction will make the considerations more 
obvious as we iterate on it.
   
   As for "storing", I think there are two aspects:
   
   1. *Expressing* the secret directly in the ConnectionConfig struct of the 
API object
   2. *Persisting* the secret somewhere
   
   Are you concerned about the API structure for (1)?
   
   I agree we probably don't want (2) to default to storing plaintext in the 
persistence DB at any point in time. There are a few different models that 
could coexist for (2):
   
   1. Fully require an external "secret" to already exist as a reference, e.g. 
to a Vault URI, and specify that Vault URI in the config info -- but this only 
shifts the problem to the meta-secret for accessing Vault, so at some point the 
caller needs to configure some secret on the Polaris side
   2. Require a separate SecretIntegration type of Polaris entity to already be 
created whose sole purpose is plumbing the secrets into some other configurable 
secret store (e.g. Polaris would manage Vault in this case), but this is 
somewhat complex for the API and ultimately the same kinds of internal secrets 
plumbing would need to be built.
   3. (Plan to do this): Within Polaris's logic for handling entities that 
define secrets, we can have a SecretsManager class whose purpose is to take 
secrets from API models and actually store them somewhere else that's safe - 
Vault, KMS, some other keystore, etc., and then add internal references to the 
stored secret inside the Polaris entity so that the secret can be found when 
needed. Callsites that need the unpacked secret again go through the 
SecretsManager to "extract" the secret from the Polaris entity; the 
implementation of the SecretsManager gets to decide for itself how it wants to 
store a reference and then unpack it again.
   4. (Should consider also supporting this model): Similar to (3) but if we 
want non-uniform types of secrets but a uniform secrets-management layer, the 
SecretsManager might, instead of only storing a reference to an external 
resource, could be leasing a new encryption key from the external system and 
then embedding the encrypted secret in some field within the Polaris entity. 
Then when the callsite again needs the unpacked secret, this SecretsManager 
would recognize to decrypt the encrypted blob in order to retrieve the original 
secrets
   
   The nice thing about such a model is that it's fairly flexible for different 
secrets-management flows under a single interface. For example, to add direct 
sigv4-based auth to AWS Glue or S3Tables, the part where the SecretsManager 
transforms the entity would *not* actually be handling a secret directly, but 
instead performing the assumeRole relationship like how the 
StorageConfiguration does it; in this world, true "secrets" are only implicitly 
handled within the environment, but the flow looks the same -- embed the 
`userArn` and `externalId` into the Polaris entity, and then later when you 
want to unpack a secret, you use something externally-managed to become that 
`userArn` before doing an `assumeRole` to mint a new subscoped token.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] API Spec: Add ConnectionConfigInfo to ExternalCatalog [polaris]

Reply via email to