sungwy commented on code in PR #4707:
URL: https://github.com/apache/polaris/pull/4707#discussion_r3424956527
##########
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java:
##########
@@ -209,6 +209,64 @@ public static void enforceFeatureEnabledOrThrow(
.defaultValue(List.<String>of())
.buildFeatureConfiguration();
+ //
---------------------------------------------------------------------------
+ // GCS principal attribution via Workload Identity Federation
+ //
+ // GCP downscoped credentials have no session-tag mechanism (unlike AWS
STS), and custom audit
+ // headers only reach GCS audit logs if the client forwards them. To
attribute GCS data access
+ // to the Polaris principal for ANY client, credential vending can chain
+ // catalog-signed JWT -> STS token exchange -> per-catalog service-account
impersonation, so the
+ // principal appears in serviceAccountDelegationInfo of every GCS Data
Access audit log entry.
+ //
+ // Attribution activates automatically once the audience, issuer, and
signing key file are all
+ // set (no on/off flag); it additionally requires a gcpServiceAccount on the
storage config.
+ //
---------------------------------------------------------------------------
+
+ public static final FeatureConfiguration<String>
GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE =
Review Comment:
A `StorageConfiguration` is defined per catalog (and already carries the
`gcpServiceAccount`). Is there a reason the attribution config is configured
per realm rather than per SA / `StorageConfiguration`? I'm wondering how it'd
handle a realm with catalogs in different GCP projects. WDYT?
##########
polaris-core/src/main/java/org/apache/polaris/core/storage/gcp/GcpCredentialsStorageIntegration.java:
##########
@@ -176,17 +178,57 @@ private GcpStorageCredentialCacheKey buildCacheKey(
@NonNull Set<String> writeLocations,
@NonNull Optional<String> refreshEndpoint,
@NonNull CredentialVendingContext context) {
+ // Principal attribution makes the vended token per-principal, so the
principal must
+ // participate in cache identity; otherwise it is left empty to preserve
cross-principal cache
+ // reuse. Attribution requires a service account to impersonate and a
principal to attribute.
+ String principalName = "";
+ Optional<GcpAttributionParams> attributionParams = Optional.empty();
+ if (principalAttributionConfigured(realmConfig())
+ && storageConfig().getGcpServiceAccount() != null) {
+ principalName = context.principalName().orElse("");
+ if (!principalName.isEmpty()) {
+ attributionParams =
+ Optional.of(
+ GcpAttributionParams.of(
+ realmConfig()
+
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER),
+ realmConfig()
+
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE),
+ realmConfig()
+
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE),
+ realmConfig()
+
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID)));
+ }
+ }
return GcpStorageCredentialCacheKey.of(
context.realm().orElse(""),
storageConfig(),
readLocations,
listLocations,
writeLocations,
refreshEndpoint,
+ principalName,
sourceCredentials,
transportFactory,
realmConfig(),
- credentialOps);
+ credentialOps,
+ attributionParams);
+ }
+
+ /**
+ * Returns true when GCS principal attribution is fully configured (WIF
audience, token issuer,
+ * and signing key file all set). There is intentionally no separate on/off
flag.
+ */
+ private static boolean principalAttributionConfigured(RealmConfig
realmConfig) {
Review Comment:
Could we make this an explicit opt-in flag (like
`GCS_PRINCIPAL_ATTRIBUTION_ENABLED`) with validation, rather than activating
implicitly once all three are set? Today if an operator sets only some of them
(typo, or misses one), this returns false and we silently fall back to the
default vending path. This means a misconfigured setup looks fine on startup
but attribution is quietly off. An explicit enable flag + a startup error when
required values are missing would make a misconfiguration obvious.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]