sungwy commented on code in PR #4707:
URL: https://github.com/apache/polaris/pull/4707#discussion_r3424956527


##########
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java:
##########
@@ -209,6 +209,64 @@ public static void enforceFeatureEnabledOrThrow(
               .defaultValue(List.<String>of())
               .buildFeatureConfiguration();
 
+  // 
---------------------------------------------------------------------------
+  // GCS principal attribution via Workload Identity Federation
+  //
+  // GCP downscoped credentials have no session-tag mechanism (unlike AWS 
STS), and custom audit
+  // headers only reach GCS audit logs if the client forwards them. To 
attribute GCS data access
+  // to the Polaris principal for ANY client, credential vending can chain
+  // catalog-signed JWT -> STS token exchange -> per-catalog service-account 
impersonation, so the
+  // principal appears in serviceAccountDelegationInfo of every GCS Data 
Access audit log entry.
+  //
+  // Attribution activates automatically once the audience, issuer, and 
signing key file are all
+  // set (no on/off flag); it additionally requires a gcpServiceAccount on the 
storage config.
+  // 
---------------------------------------------------------------------------
+
+  public static final FeatureConfiguration<String> 
GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE =

Review Comment:
   A `StorageConfiguration` is defined per catalog (and already carries the 
`gcpServiceAccount`). Is there a reason the attribution config is configured 
per realm rather than per SA / `StorageConfiguration`? I'm wondering how it'd 
handle a realm with catalogs in different GCP projects. WDYT?



##########
polaris-core/src/main/java/org/apache/polaris/core/storage/gcp/GcpCredentialsStorageIntegration.java:
##########
@@ -176,17 +178,57 @@ private GcpStorageCredentialCacheKey buildCacheKey(
       @NonNull Set<String> writeLocations,
       @NonNull Optional<String> refreshEndpoint,
       @NonNull CredentialVendingContext context) {
+    // Principal attribution makes the vended token per-principal, so the 
principal must
+    // participate in cache identity; otherwise it is left empty to preserve 
cross-principal cache
+    // reuse. Attribution requires a service account to impersonate and a 
principal to attribute.
+    String principalName = "";
+    Optional<GcpAttributionParams> attributionParams = Optional.empty();
+    if (principalAttributionConfigured(realmConfig())
+        && storageConfig().getGcpServiceAccount() != null) {
+      principalName = context.principalName().orElse("");
+      if (!principalName.isEmpty()) {
+        attributionParams =
+            Optional.of(
+                GcpAttributionParams.of(
+                    realmConfig()
+                        
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER),
+                    realmConfig()
+                        
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE),
+                    realmConfig()
+                        
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE),
+                    realmConfig()
+                        
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID)));
+      }
+    }
     return GcpStorageCredentialCacheKey.of(
         context.realm().orElse(""),
         storageConfig(),
         readLocations,
         listLocations,
         writeLocations,
         refreshEndpoint,
+        principalName,
         sourceCredentials,
         transportFactory,
         realmConfig(),
-        credentialOps);
+        credentialOps,
+        attributionParams);
+  }
+
+  /**
+   * Returns true when GCS principal attribution is fully configured (WIF 
audience, token issuer,
+   * and signing key file all set). There is intentionally no separate on/off 
flag.
+   */
+  private static boolean principalAttributionConfigured(RealmConfig 
realmConfig) {

Review Comment:
   Could we make this an explicit opt-in flag (like 
`GCS_PRINCIPAL_ATTRIBUTION_ENABLED`) with validation, rather than activating 
implicitly once all three are set? Today if an operator sets only some of them 
(typo, or misses one), this returns false and we silently fall back to the 
default vending path. This means a misconfigured setup looks fine on startup 
but attribution is quietly off. An explicit enable flag + a startup error when 
required values are missing would make a misconfiguration obvious.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to