Re: [PR] Fix origin_server_auth URL encoding for mixed-encoding URLs [trafficserver]

via GitHub Mon, 12 Jan 2026 14:49:48 -0800


Copilot commented on code in PR #12802:
URL: https://github.com/apache/trafficserver/pull/12802#discussion_r2684161719



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -132,43 +167,56 @@ isUriEncoded(const String &in, bool isObjectName)
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid encoded sequence found */
+        foundEncodedChar  = true;
+        pos              += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* lonely '%' should have been encoded with %25 according to the RFC */
         return false;
       }
     }
+
+    /* Any other character that reaches here should have been encoded but 
wasn't.
+     * This includes: space, '/', '(', ')', '[', ']', etc.
+     * Return false to indicate the string is not fully encoded. */
+    return false;
   }
 
-  return false;
+  /* If we found at least one encoded character and no unencoded special chars,
+   * the string is fully encoded. If no encoded chars were found, the string
+   * contains only unreserved chars (and possibly '/' for object names). */
+  return foundEncodedChar;

Review Comment:
   The `isUriEncoded` function returns `false` for strings containing only 
unreserved characters (like "/test/file.txt" or "plain-text.js"). While 
functionally correct (these strings will pass through decode/encode unchanged 
in `canonicalEncode`), this introduces unnecessary overhead. Consider 
optimizing by returning `true` when the loop completes without finding any 
characters that need encoding, indicating the string is already in its 
canonical form.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -132,43 +167,56 @@ isUriEncoded(const String &in, bool isObjectName)
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid encoded sequence found */
+        foundEncodedChar  = true;
+        pos              += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* lonely '%' should have been encoded with %25 according to the RFC */
         return false;
       }
     }
+
+    /* Any other character that reaches here should have been encoded but 
wasn't.
+     * This includes: space, '/', '(', ')', '[', ']', etc.
+     * Return false to indicate the string is not fully encoded. */
+    return false;
   }
 
-  return false;
+  /* If we found at least one encoded character and no unencoded special chars,
+   * the string is fully encoded. If no encoded chars were found, the string
+   * contains only unreserved chars (and possibly '/' for object names). */
+  return foundEncodedChar;
 }
 
 String
 canonicalEncode(const String &in, bool isObjectName)
 {
-  String canonical;
-  if (!isUriEncoded(in, isObjectName)) {
-    /* Not URI-encoded */
-    canonical = uriEncode(in, isObjectName);
-  } else {
-    /* URI-encoded, then don't encode since AWS does not encode which is not 
mentioned in the spec,
-     * asked AWS, still waiting for confirmation */
-    canonical = in;
+  if (isUriEncoded(in, isObjectName)) {
+    /* Fully URI-encoded, return as-is */
+    return in;
   }
 
+  /* Not fully encoded (either unencoded or partially encoded).
+   * Decode first to avoid double-encoding, then re-encode with AWS canonical 
rules.
+   *
+   * This fixes a bug where mixed-encoding URLs (e.g., 
/app/(channel)/%5B%5Bparts%5D%5D/)
+   * would cause signature mismatch. The parentheses were not encoded but 
brackets were.
+   *
+   * Example:
+   * - Input: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partially encoded)
+   * - Decode: /app/(channel)/[[parts]]/page.js
+   * - Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+   */
+  String decoded   = uriDecode(in);
+  String canonical = uriEncode(decoded, isObjectName);
   return canonical;

Review Comment:
   The decode-then-reencode approach in `canonicalEncode` may have unintended 
behavior with plus signs. If a URL contains "%2B" (encoded plus sign), it will 
be decoded to "+" and then re-encoded as "%20" (space) by `uriEncode` (lines 
125-128), changing its semantic meaning. While this matches the existing 
AWS-specific behavior where '+' is treated as space, it means that "%2B" 
sequences in mixed-encoded URLs will be converted to "%20". Consider adding a 
test case to document this behavior and verify it matches AWS SigV4 
expectations for plus sign handling in URI paths.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -132,43 +167,56 @@ isUriEncoded(const String &in, bool isObjectName)
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid encoded sequence found */
+        foundEncodedChar  = true;
+        pos              += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* lonely '%' should have been encoded with %25 according to the RFC */
         return false;
       }
     }
+
+    /* Any other character that reaches here should have been encoded but 
wasn't.
+     * This includes: space, '/', '(', ')', '[', ']', etc.
+     * Return false to indicate the string is not fully encoded. */
+    return false;
   }
 
-  return false;
+  /* If we found at least one encoded character and no unencoded special chars,
+   * the string is fully encoded. If no encoded chars were found, the string
+   * contains only unreserved chars (and possibly '/' for object names). */
+  return foundEncodedChar;

Review Comment:
   The `isUriEncoded` function accepts both uppercase and lowercase hex digits 
(via `std::isxdigit`), and `canonicalEncode` returns such strings unchanged. 
However, AWS SigV4 spec requires uppercase hex digits (e.g., "%2F" not "%2f"). 
URLs with lowercase hex encoding (e.g., "/test%2ffile") will bypass the 
decode/encode normalization and retain lowercase hex, potentially causing 
signature mismatches. Consider either: (1) modifying `isUriEncoded` to return 
false for lowercase hex digits, forcing normalization through decode/encode, or 
(2) add a case-normalization step for fully-encoded strings.



##########
tests/gold_tests/pluginTest/origin_server_auth/s3_url_encoding.test.py:
##########
@@ -0,0 +1,195 @@
+'''
+Test origin_server_auth URL encoding bug (YTSATS-4835)
+
+This test demonstrates a bug where mixed URL encoding in the request path
+causes signature mismatch with S3.
+
+Bug: When a URL has SOME characters encoded (e.g., %5B) but others NOT encoded
+(e.g., parentheses), the isUriEncoded() function incorrectly returns true,
+causing canonicalEncode() to skip re-encoding. This results in a signature
+calculated for a partially-encoded path, which won't match what S3 expects.
+
+Example:
+  Client sends:    /app/(channel)/%5B%5Bparts%5D%5D/page.js
+  isUriEncoded():  Returns true (finds %5B)
+  canonicalEncode(): Returns as-is (thinks already encoded)
+  Signature for:   /app/(channel)/%5B%5Bparts%5D%5D/page.js  <- WRONG
+  S3 expects:      /app/%28channel%29/%5B%5Bparts%5D%5D/page.js  <- CORRECT
+
+This test sends requests with various URL encodings and captures what
+the origin receives, including the Authorization header signature.
+'''
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+Test.testName = "origin_server_auth: URL encoding bug"
+Test.Summary = '''
+Test for S3 signature calculation with mixed URL encoding
+'''
+Test.ContinueOnFail = True
+
+
+class S3UrlEncodingTest:
+    """
+    Test class for verifying URL encoding behavior in S3 auth signature 
calculation.
+    """
+
+    def __init__(self):
+        self.setupOriginServer()
+        self.setupTS()
+
+    def setupOriginServer(self):
+        """
+        Create an origin server that captures and logs the request path and 
headers.
+        ATS preserves URL encoding from the client, so we need to match 
exactly.
+        """
+        self.server = Test.MakeOriginServer("s3_mock")
+
+        # Test case 1: Fully unencoded path with parentheses
+        # Client sends: /bucket/app/(channel)/test.js (unencoded)
+        # ATS forwards: /bucket/app/(channel)/test.js (as-is)
+        request1 = {
+            "headers": "GET /bucket/app/(channel)/test.js HTTP/1.1\r\nHost: 
s3.amazonaws.com\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": ""
+        }
+        response1 = {
+            "headers": "HTTP/1.1 200 OK\r\nConnection: 
close\r\nContent-Length: 7\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": "test1ok"
+        }
+        self.server.addResponse("sessionlog.log", request1, response1)
+
+        # Test case 2: Fully encoded parentheses
+        # Client sends: /bucket/app/%28channel%29/test.js (encoded)
+        # ATS forwards: /bucket/app/%28channel%29/test.js (as-is, preserves 
encoding)
+        request2 = {
+            "headers": "GET /bucket/app/%28channel%29/test.js 
HTTP/1.1\r\nHost: s3.amazonaws.com\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": ""
+        }
+        response2 = {
+            "headers": "HTTP/1.1 200 OK\r\nConnection: 
close\r\nContent-Length: 7\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": "test2ok"
+        }
+        self.server.addResponse("sessionlog.log", request2, response2)
+
+        # Test case 3: Mixed encoding - BUG CASE
+        # Client sends: /bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js
+        #   - Parentheses () NOT encoded
+        #   - Brackets [] ARE encoded as %5B%5D
+        # ATS forwards: /bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js (as-is)
+        #
+        # BUG: isUriEncoded() sees %5B -> returns true -> canonicalEncode() 
skips encoding

Review Comment:
   The comments on lines 7-10 and 97 describe the bug in present tense ("the 
isUriEncoded() function incorrectly returns true"), but this PR fixes that bug. 
These comments should be updated to past tense or rephrased to indicate this 
was the bug that has been fixed, for example: "This test verifies the fix for a 
bug where mixed URL encoding..." or "Previously, when a URL had SOME characters 
encoded..."



##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -142,6 +142,156 @@ TEST_CASE("isUriEncoded(): reserved chars in the input", 
"[AWS][auth][utility]")
   CHECK(true == isUriEncoded(encoded));
 }
 
+/*
+ * BUG: isUriEncoded() returns true if ANY %XX sequence is found, even if other
+ * reserved characters are NOT encoded. This causes canonicalEncode() to skip
+ * encoding, resulting in signature mismatch with S3.
+ *
+ * Example from YTSATS-4835:
+ *   Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *   ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partial decode)
+ *   isUriEncoded() sees %5B -> returns true (incorrectly assumes fully 
encoded)
+ *   canonicalEncode() returns as-is: /app/(channel)/%5B%5Bparts%5D%5D/page.js
+ *   Signature calculated for partially-encoded path
+ *   S3 expects signature for: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *   Result: 403 signature mismatch

Review Comment:
   The comment block on lines 145-158 describes the bug in present tense ("BUG: 
isUriEncoded() returns true if ANY %XX sequence is found..."), but this PR 
fixes that bug. The comment should be updated to indicate this was a historical 
bug that has been fixed, for example: "This test verifies the fix for a bug 
where isUriEncoded() would return true if ANY %XX sequence was found..."
   ```suggestion
    * This test verifies the fix for a bug where isUriEncoded() would return 
true
    * if ANY %XX sequence was found, even if other reserved characters were NOT
    * encoded. That behavior caused canonicalEncode() to skip encoding, 
resulting
    * in a signature mismatch with S3.
    *
    * Historical example from YTSATS-4835:
    *   Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
    *   ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partial 
decode)
    *   isUriEncoded() saw %5B -> returned true (incorrectly assumed fully 
encoded)
    *   canonicalEncode() returned as-is: 
/app/(channel)/%5B%5Bparts%5D%5D/page.js
    *   Signature was calculated for the partially-encoded path
    *   S3 expected the signature for: 
/app/%28channel%29/%5B%5Bparts%5D%5D/page.js
    *   Result: 403 signature mismatch (now prevented by this fix)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix origin_server_auth URL encoding for mixed-encoding URLs [trafficserver]

Reply via email to