obelix74 commented on code in PR #3414:
URL: https://github.com/apache/polaris/pull/3414#discussion_r2692292002


##########
polaris-core/src/main/java/org/apache/polaris/core/storage/aws/AwsSessionTagsBuilder.java:
##########
@@ -78,24 +98,68 @@ public static List<Tag> buildSessionTags(String 
principalName, CredentialVending
             
.value(truncateTagValue(context.tableName().orElse(TAG_VALUE_UNKNOWN)))
             .build());
 
+    // Only include trace ID if it's present in the context.
+    // The context's traceId is only populated when 
INCLUDE_TRACE_ID_IN_SESSION_TAGS is enabled.
+    // This allows efficient credential caching when trace IDs are not needed 
in session tags.
+    context
+        .traceId()
+        .ifPresent(
+            traceId ->
+                tags.add(
+                    Tag.builder()
+                        .key(CredentialVendingContext.TAG_KEY_TRACE_ID)
+                        .value(truncateTagValue(traceId))
+                        .build()));
+
+    // Only include request ID if it's present in the context.
+    // The context's requestId is only populated when 
INCLUDE_REQUEST_ID_IN_SESSION_TAGS is enabled.
+    // This allows efficient credential caching when request IDs are not 
needed in session tags.
+    context
+        .requestId()
+        .ifPresent(
+            requestId ->
+                tags.add(
+                    Tag.builder()
+                        .key(CredentialVendingContext.TAG_KEY_REQUEST_ID)
+                        .value(truncateTagValue(requestId))
+                        .build()));
+
     return tags;
   }
 
   /**
-   * Truncates a tag value to fit within AWS STS limits. AWS limits session 
tag values to 256
-   * characters. Returns "unknown" placeholder for null or empty values to 
ensure consistent tag
-   * presence in CloudTrail.
+   * Prepares a tag value for AWS STS by sanitizing invalid characters and 
truncating to fit within
+   * AWS limits. AWS limits session tag values to 256 characters and restricts 
the allowed character
+   * set to Unicode letters, whitespace, numbers, and specific special 
characters (_.:/=+\-@).
+   *
+   * <p>Invalid characters are replaced with underscores to maintain value 
readability while
+   * ensuring STS acceptance. Returns "unknown" placeholder for null or empty 
values to ensure
+   * consistent tag presence in CloudTrail.
    *
-   * @param value the value to potentially truncate
-   * @return the truncated value, or "unknown" if value is null or empty
+   * @param value the value to sanitize and potentially truncate
+   * @return the sanitized and truncated value, or "unknown" if value is null 
or empty
    */
   static String truncateTagValue(String value) {
     if (value == null || value.isEmpty()) {
       return TAG_VALUE_UNKNOWN;
     }
-    if (value.length() <= MAX_TAG_VALUE_LENGTH) {
-      return value;
+    // Sanitize invalid characters first, then truncate
+    String sanitized = sanitizeTagValue(value);
+    if (sanitized.length() <= MAX_TAG_VALUE_LENGTH) {
+      return sanitized;
     }
-    return value.substring(0, MAX_TAG_VALUE_LENGTH);
+    return sanitized.substring(0, MAX_TAG_VALUE_LENGTH);
+  }
+
+  /**
+   * Sanitizes a tag value by replacing characters that are not allowed in AWS 
STS session tag
+   * values. AWS STS allows: Unicode letters, whitespace, numbers, and 
specific special characters
+   * (_.:/=+\-@). Invalid characters are replaced with underscores.
+   *
+   * @param value the value to sanitize (must not be null)
+   * @return the sanitized value with invalid characters replaced by 
underscores
+   */
+  static String sanitizeTagValue(String value) {
+    return INVALID_TAG_VALUE_CHARS.matcher(value).replaceAll("_");

Review Comment:
   @dimas-b 
   
   If `trace_id` is coming from OpenTelemetry 
(`Span.current().getSpanContext().getTraceId()`), then in practice it shouldn’t 
ever be “invalid” for AWS STS session tags.
        • **Format/charset**: OpenTelemetry trace IDs are represented as a 
32‑character lowercase hex string (`[0-9a-f]{32}`), and AWS session tag values 
allow letters/numbers plus a small set of symbols. Hex is entirely within that 
allowed set.
        • **Length**: 32 characters is far below the STS tag value limit (256).
        • **Validity gating in Polaris**: In 
StorageAccessConfigProvider#getCurrentTraceId(), we only include it when 
`spanContext.isValid()` is true; if the context is invalid (including “all 
zeros” or malformed), we return `Optional.empty()` and the `trace_id` tag won’t 
be added at all. 
   
   The code above was specifically added for `request_id`s since they can be 
arbitrary strings - it has been removed in the revert of the last commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to