obelix74 commented on code in PR #4707:
URL: https://github.com/apache/polaris/pull/4707#discussion_r3433033153
##########
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java:
##########
@@ -209,6 +209,82 @@ public static void enforceFeatureEnabledOrThrow(
.defaultValue(List.<String>of())
.buildFeatureConfiguration();
+ //
---------------------------------------------------------------------------
+ // GCS principal attribution via Workload Identity Federation
Review Comment:
Done — removed the block comment. The individual `description` fields on
each config constant are sufficient.
##########
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java:
##########
@@ -209,6 +209,82 @@ public static void enforceFeatureEnabledOrThrow(
.defaultValue(List.<String>of())
.buildFeatureConfiguration();
+ //
---------------------------------------------------------------------------
+ // GCS principal attribution via Workload Identity Federation
+ //
+ // GCP downscoped credentials have no session-tag mechanism (unlike AWS
STS), and custom audit
+ // headers only reach GCS audit logs if the client forwards them. To
attribute GCS data access
+ // to the Polaris principal for ANY client, credential vending can chain
+ // catalog-signed JWT -> STS token exchange -> per-catalog service-account
impersonation, so the
+ // principal appears in serviceAccountDelegationInfo of every GCS Data
Access audit log entry.
+ //
+ // Attribution must be explicitly enabled via
GCS_PRINCIPAL_ATTRIBUTION_ENABLED. When enabled,
+ // WIF_AUDIENCE, TOKEN_ISSUER, and SIGNING_KEY_FILE are all required;
Polaris will throw at the
+ // first credential-vending attempt if any are missing. Additionally
requires a gcpServiceAccount
+ // on the per-catalog StorageConfiguration.
+ //
---------------------------------------------------------------------------
+
+ public static final FeatureConfiguration<Boolean>
GCS_PRINCIPAL_ATTRIBUTION_ENABLED =
+ PolarisConfiguration.<Boolean>builder()
+ .key("GCS_PRINCIPAL_ATTRIBUTION_ENABLED")
+ .description(
+ "Enables GCS principal attribution via Workload Identity
Federation.\n"
+ + "When true, credential vending chains a catalog-signed JWT
through an STS token\n"
+ + "exchange and service-account impersonation so the Polaris
principal appears in GCS\n"
+ + "Data Access audit logs
(serviceAccountDelegationInfo.principalSubject).\n"
+ + "Requires GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE,
GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER,\n"
+ + "and GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE to also be
set;\n"
+ + "a missing required value is a fatal configuration
error.\n"
+ + "Also requires a gcpServiceAccount on the catalog
StorageConfiguration.\n"
+ + "Default: false (attribution disabled).")
+ .defaultValue(false)
+ .buildFeatureConfiguration();
+
+ public static final FeatureConfiguration<String>
GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE =
+ PolarisConfiguration.<String>builder()
+ .key("GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE")
+ .description(
+ "Full resource name of the Workload Identity Pool provider used
for GCS principal\n"
+ + "attribution, e.g.\n"
+ +
"//iam.googleapis.com/projects/<num>/locations/global/workloadIdentityPools/<pool>/providers/<provider>.\n"
+ + "Used as both the attribution JWT 'aud' claim and the STS
token-exchange audience.\n"
+ + "Empty (default) disables principal attribution.")
Review Comment:
Fixed in 79fa1a380. Updated all three descriptions (`WIF_AUDIENCE`,
`TOKEN_ISSUER`, `SIGNING_KEY_FILE`) to say "Required when
`GCS_PRINCIPAL_ATTRIBUTION_ENABLED=true`; ignored otherwise."
##########
polaris-core/src/main/java/org/apache/polaris/core/storage/gcp/GcpAttributionSubjectBuilder.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.polaris.core.storage.gcp;
+
+/**
+ * Builds the {@code sub} claim for GCS principal-attribution JWTs as {@code
<realm>/<principal>},
+ * within GCP's 127-character {@code google.subject} limit.
+ *
+ * <p>The character budget mirrors the AWS session-name builder: one character
is reserved for the
+ * separator, then each field receives an equal share of the remainder, and
budget unused by a short
+ * field flows to the other. ISO control characters and the {@code /}
separator are stripped from
+ * each field so the subject stays unambiguously parseable, and the {@code
unknown} placeholder
+ * substitutes null/empty fields so the subject shape stays stable.
+ */
+public final class GcpAttributionSubjectBuilder {
+
+ /** GCP limit for the {@code google.subject} attribute of a federated
identity. */
+ public static final int MAX_SUBJECT_LENGTH = 127;
+
+ static final String SEPARATOR = "/";
+
+ static final String VALUE_UNKNOWN = "unknown";
+
+ private GcpAttributionSubjectBuilder() {}
+
+ /**
+ * Builds the attribution subject {@code <realm>/<principal>}, guaranteed to
be at most {@value
+ * #MAX_SUBJECT_LENGTH} characters.
+ *
+ * @param realm the realm identifier (gets first-half budget priority)
+ * @param principalName the Polaris principal name
+ * @return the subject string
+ */
+ public static String buildSubject(String realm, String principalName) {
+ String cleanRealm = sanitize(realm);
+ String cleanPrincipal = sanitize(principalName);
+
+ int budget = MAX_SUBJECT_LENGTH - SEPARATOR.length();
+ int remaining = budget;
+
+ int realmAlloc = remaining / 2;
+ int realmUsed = Math.min(cleanRealm.length(), realmAlloc);
+ remaining -= realmUsed;
+
+ int principalUsed = Math.min(cleanPrincipal.length(), remaining);
+ remaining -= principalUsed;
+
+ // Carry-forward: if the principal left budget unused, the realm may take
more than its
+ // initial half-share.
+ int realmFinal = Math.min(cleanRealm.length(), realmUsed + remaining);
Review Comment:
Updated the comment to say: "Unused budget from either field flows to the
other. A short realm gives principal more room via the larger `remaining`, and
a short principal gives realm more room here."
##########
polaris-core/src/main/java/org/apache/polaris/core/storage/gcp/GcpFederatedCredentialsExchanger.java:
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.polaris.core.storage.gcp;
+
+import com.auth0.jwt.JWT;
+import com.auth0.jwt.JWTCreator;
+import com.auth0.jwt.algorithms.Algorithm;
+import com.google.auth.http.HttpTransportFactory;
+import com.google.auth.oauth2.GoogleCredentials;
+import com.google.auth.oauth2.IdentityPoolCredentials;
+import com.google.common.annotations.VisibleForTesting;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.security.KeyFactory;
+import java.security.interfaces.RSAPrivateKey;
+import java.security.spec.PKCS8EncodedKeySpec;
+import java.time.Duration;
+import java.time.Instant;
+import java.util.Base64;
+import java.util.Date;
+import java.util.List;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Produces a GCP federated {@link GoogleCredentials} whose identity carries
{@code
+ * <realm>/<principal>}, so that GCS Data Access audit logs attribute access
to the requesting
+ * Polaris principal. This is the GCP counterpart of AWS STS session tags.
+ *
+ * <p>The federated credential is an {@link IdentityPoolCredentials} backed by
a programmatic
+ * subject-token supplier: on each token refresh google-auth invokes the
supplier, which mints a
+ * short-lived RS256 JWT ({@code sub = <realm>/<principal>}, {@code realm}
claim), and exchanges it
+ * at the Workload Identity Pool provider's STS endpoint. The provider maps
{@code google.subject =
+ * assertion.sub} and {@code attribute.realm = assertion.realm}; per-realm
{@code attribute.realm}
+ * IAM bindings then enforce that a realm-A identity can only impersonate
realm-A's service account.
+ * The returned credential is intended to be used as the source for tenant
service-account
+ * impersonation (see {@link GcpCredentialsStorageIntegration}).
+ *
+ * <p>Network note: this performs an STS token exchange against {@code
sts.googleapis.com} in
+ * addition to the existing {@code iamcredentials.googleapis.com} and {@code
storage.googleapis.com}
+ * traffic.
+ */
+public class GcpFederatedCredentialsExchanger {
+
+ static final String STS_TOKEN_URL = "https://sts.googleapis.com/v1/token";
+ static final String SUBJECT_TOKEN_TYPE =
"urn:ietf:params:oauth:token-type:jwt";
+ static final String CLOUD_PLATFORM_SCOPE =
"https://www.googleapis.com/auth/cloud-platform";
+
+ /** Attribution JWTs are single-purpose and short-lived. */
+ static final Duration JWT_LIFETIME = Duration.ofMinutes(5);
+
+ /**
+ * JVM-wide cache of parsed signing keys, keyed by file path. The key file
is a stable pod-mounted
+ * secret; parsing it (disk read + {@link KeyFactory}) once per path
amortizes across vends rather
+ * than re-reading on every credential-cache miss. Key rotation is delivered
by a process restart
+ * (the secret is mounted at startup), which clears this cache.
+ */
+ private static final ConcurrentHashMap<Path, RSAPrivateKey>
SIGNING_KEY_CACHE =
+ new ConcurrentHashMap<>();
Review Comment:
Added a note to the Javadoc: "In practice the map is bounded by the number
of realms in the deployment — each realm has at most one signing-key path — so
memory growth is proportional to realm count."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]