Re: [PR] Fix origin_server_auth URL encoding for mixed-encoding URLs [trafficserver]

via GitHub Tue, 13 Jan 2026 09:38:27 -0800


Copilot commented on code in PR #12802:
URL: https://github.com/apache/trafficserver/pull/12802#discussion_r2687422452



##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -70,7 +70,8 @@ TEST_CASE("uriEncode(): encode reserved chars in an object 
name", "[AWS][auth][u
 
 TEST_CASE("isUriEncoded(): check an empty input", "[AWS][auth][utility]")
 {
-  CHECK(false == isUriEncoded(""));
+  // Empty string has no characters that need encoding - it's already canonical
+  CHECK(true == isUriEncoded(""));

Review Comment:
   The behavior change for empty strings is semantically reasonable (an empty 
string is already canonical), but this changes the return value from false to 
true for empty input. While the test has been updated to reflect this, ensure 
that all calling code handles this correctly, especially since the PR 
description doesn't explicitly mention this behavioral change for empty strings.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -66,6 +66,34 @@ base16Encode(const char *in, size_t inLen)
   return result.str();
 }
 
+/**
+ * @brief URI-decode a character string
+ *
+ * Decodes percent-encoded characters (e.g., %20 -> space, %2F -> /)
+ *
+ * @param in string to be URI decoded
+ * @return decoded string
+ */
+String
+uriDecode(const String &in)
+{
+  String result;
+  result.reserve(in.length()); /* Decoded string will be same size or smaller 
*/
+
+  for (size_t i = 0; i < in.length(); i++) {
+    if (in[i] == '%' && i + 2 < in.length() && std::isxdigit(in[i + 1]) && 
std::isxdigit(in[i + 2])) {

Review Comment:
   Using std::isxdigit with potentially signed char values (when char is 
signed) can cause undefined behavior if the value is negative. The arguments to 
isxdigit should be cast to unsigned char to ensure values in the range 0-255 
are correctly handled. This applies to all three calls: in[i + 1] and in[i + 2].
   ```suggestion
       if (in[i] == '%' && i + 2 < in.length() &&
           std::isxdigit(static_cast<unsigned char>(in[i + 1])) &&
           std::isxdigit(static_cast<unsigned char>(in[i + 2]))) {
   ```



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -66,6 +66,34 @@ base16Encode(const char *in, size_t inLen)
   return result.str();
 }
 
+/**
+ * @brief URI-decode a character string
+ *
+ * Decodes percent-encoded characters (e.g., %20 -> space, %2F -> /)
+ *
+ * @param in string to be URI decoded
+ * @return decoded string
+ */
+String
+uriDecode(const String &in)
+{
+  String result;
+  result.reserve(in.length()); /* Decoded string will be same size or smaller 
*/
+
+  for (size_t i = 0; i < in.length(); i++) {
+    if (in[i] == '%' && i + 2 < in.length() && std::isxdigit(in[i + 1]) && 
std::isxdigit(in[i + 2])) {
+      /* Decode %XX to character */
+      char hex[3]  = {in[i + 1], in[i + 2], '\0'};
+      result      += static_cast<char>(std::strtol(hex, nullptr, 16));
+      i           += 2; /* Skip past the hex digits */
+    } else {
+      result += in[i];
+    }
+  }
+
+  return result;
+}

Review Comment:
   The uriDecode function does not handle invalid UTF-8 sequences or characters 
outside the printable ASCII range. When decoding percent-encoded bytes that 
represent non-ASCII or invalid UTF-8 sequences, the function may produce 
strings with invalid characters. Consider adding validation or documenting this 
limitation, especially since this is used in a security-sensitive context for 
AWS signature calculation.



##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -142,6 +149,254 @@ TEST_CASE("isUriEncoded(): reserved chars in the input", 
"[AWS][auth][utility]")
   CHECK(true == isUriEncoded(encoded));
 }
 
+/*
+ * BUG: isUriEncoded() returns true if ANY %XX sequence is found, even if other
+ * reserved characters are NOT encoded. This causes canonicalEncode() to skip
+ * encoding, resulting in signature mismatch with S3.
+ *
+ * Example from YTSATS-4835:
+ *   Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *   ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partial decode)

Review Comment:
   The comment on line 159 describes a scenario where "ATS decodes to: 
/app/(channel)/%5B%5Bparts%5D%5D/page.js (partial decode)". However, this seems 
to imply that ATS performs partial decoding of the URL, which may not be 
accurate. According to the test setup and description, ATS preserves URL 
encoding from the client and forwards it as-is. The comment should clarify 
whether this partial decode is actually performed by ATS or if it's just 
describing what a hypothetical partial decode would look like.
   ```suggestion
    *   Partially decoded form involved in the bug: 
/app/(channel)/%5B%5Bparts%5D%5D/page.js  (partial decode example)
    *   (ATS preserves the client’s URL encoding and forwards it as-is; this 
line illustrates the mixed-encoding state)
   ```



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
     char c = in[pos];
 
     if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
-      /* found a unreserved character which should not have been be encoded 
regardless
-       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.  */
+      /* Unreserved characters don't need encoding:
+       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE 
hex digits.
+         * If lowercase hex is found, return false to trigger normalization.
+         * See: 
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+         * "Letters in the hexadecimal value must be uppercase, for example 
"%1A"." */
+        if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+          return false; /* Lowercase hex needs normalization to uppercase */
+        }
+        pos += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* Lone '%' or incomplete sequence - needs encoding as %25 */
         return false;
       }
     }
+
+    /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+    return false;
   }
 
-  return false;
+  /* String is already in canonical form:
+   *   - Only contains unreserved chars, slashes (for object names), or
+   *   - Properly percent-encoded sequences with uppercase hex
+   * No decode/re-encode needed. */
+  return true;
 }
 
 String
 canonicalEncode(const String &in, bool isObjectName)
 {
-  String canonical;
-  if (!isUriEncoded(in, isObjectName)) {
-    /* Not URI-encoded */
-    canonical = uriEncode(in, isObjectName);
-  } else {
-    /* URI-encoded, then don't encode since AWS does not encode which is not 
mentioned in the spec,
-     * asked AWS, still waiting for confirmation */
-    canonical = in;
+  if (isUriEncoded(in, isObjectName)) {
+    /* Fully URI-encoded with uppercase hex, return as-is */
+    return in;
   }
 
+  /* Input needs normalization. This handles:
+   *   1. Unencoded strings - encode all reserved characters
+   *   2. Partially encoded - some chars encoded, some not (e.g., parentheses 
vs brackets)
+   *   3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+   *
+   * Decode first to get the raw string, then re-encode with AWS canonical 
rules.
+   *
+   * Example (mixed encoding):
+   *   Input:  /app/(channel)/%5B%5Bparts%5D%5D/page.js  (parentheses not 
encoded)
+   *   Decode: /app/(channel)/[[parts]]/page.js
+   *   Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+   *
+   * Example (lowercase hex):
+   *   Input:  /path/%5btest%5d/file.js  (lowercase hex)
+   *   Decode: /path/[test]/file.js
+   *   Encode: /path/%5Btest%5D/file.js  (uppercase hex)
+   */
+  String decoded   = uriDecode(in);
+  String canonical = uriEncode(decoded, isObjectName);
   return canonical;

Review Comment:
   The decode-then-reencode approach allocates two temporary strings for every 
URL that needs normalization. For high-traffic scenarios with many requests 
containing mixed encoding, this could have performance implications. Consider 
adding a fast-path optimization that checks if the string needs normalization 
before allocating the decode buffer, or document the performance 
characteristics of this approach.



##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -142,6 +149,254 @@ TEST_CASE("isUriEncoded(): reserved chars in the input", 
"[AWS][auth][utility]")
   CHECK(true == isUriEncoded(encoded));
 }
 
+/*
+ * BUG: isUriEncoded() returns true if ANY %XX sequence is found, even if other
+ * reserved characters are NOT encoded. This causes canonicalEncode() to skip
+ * encoding, resulting in signature mismatch with S3.
+ *
+ * Example from YTSATS-4835:
+ *   Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *   ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partial decode)
+ *   isUriEncoded() sees %5B -> returns true (incorrectly assumes fully 
encoded)
+ *   canonicalEncode() returns as-is: /app/(channel)/%5B%5Bparts%5D%5D/page.js
+ *   Signature calculated for partially-encoded path
+ *   S3 expects signature for: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *   Result: 403 signature mismatch
+ */
+TEST_CASE("isUriEncoded(): mixed encoding with unencoded parentheses and 
encoded brackets", "[AWS][auth][utility]")
+{
+  // Path with parentheses NOT encoded but brackets ARE encoded
+  // This is the bug case from YTSATS-4835
+  const String mixedEncoding = "/app/(channel)/%5B%5Bparts%5D%5D/page.js";
+
+  // Fixed behavior: returns false because parentheses () are not encoded
+  // Even though %5B is present, the string is only PARTIALLY encoded
+  CHECK(false == isUriEncoded(mixedEncoding, /* isObjectName */ true));
+}
+
+TEST_CASE("isUriEncoded(): unencoded parentheses should indicate not fully 
encoded", "[AWS][auth][utility]")
+{
+  // Parentheses are reserved characters per AWS spec and should be encoded
+  // If we see unencoded parentheses, the string is NOT properly URI-encoded
+  const String withParens = "/app/(channel)/test.js";
+
+  // Returns false (no encoded chars found, string needs encoding)
+  CHECK(false == isUriEncoded(withParens, /* isObjectName */ true));
+
+  // After encoding, parentheses become %28 and %29
+  String encoded = uriEncode(withParens, /* isObjectName */ true);
+  CHECK(encoded == "/app/%28channel%29/test.js");
+}
+
+TEST_CASE("uriDecode(): decode percent-encoded characters", 
"[AWS][auth][utility]")
+{
+  // Test basic decoding
+  CHECK(uriDecode("%28") == "(");
+  CHECK(uriDecode("%29") == ")");
+  CHECK(uriDecode("%5B") == "[");
+  CHECK(uriDecode("%5D") == "]");
+  CHECK(uriDecode("%20") == " ");
+
+  // Test mixed content
+  CHECK(uriDecode("/app/%28channel%29/test.js") == "/app/(channel)/test.js");
+  CHECK(uriDecode("/app/%5B%5Bparts%5D%5D/page.js") == 
"/app/[[parts]]/page.js");
+
+  // Test passthrough of non-encoded content
+  CHECK(uriDecode("/app/test.js") == "/app/test.js");
+  CHECK(uriDecode("hello-world_123.txt") == "hello-world_123.txt");
+
+  // Test mixed encoded and non-encoded (the bug case)
+  CHECK(uriDecode("/app/(channel)/%5B%5Bparts%5D%5D/page.js") == 
"/app/(channel)/[[parts]]/page.js");
+}
+
+TEST_CASE("canonicalEncode(): mixed encoding produces correct canonical URI", 
"[AWS][auth][utility]")
+{
+  // Fixed behavior: canonicalEncode() now decodes first, then re-encodes
+  // This ensures consistent canonical output regardless of input encoding
+
+  const String mixedEncoding = "/app/(channel)/%5B%5Bparts%5D%5D/page.js";
+
+  String canonical = canonicalEncode(mixedEncoding, /* isObjectName */ true);
+
+  // Correct behavior: ALL reserved characters are encoded
+  CHECK(canonical == "/app/%28channel%29/%5B%5Bparts%5D%5D/page.js");
+}
+
+TEST_CASE("canonicalEncode(): fully encoded input should pass through 
unchanged", "[AWS][auth][utility]")
+{
+  // When ALL reserved chars are properly encoded, canonicalEncode should 
return as-is
+  const String fullyEncoded = "/app/%28channel%29/%5B%5Bparts%5D%5D/page.js";
+
+  String canonical = canonicalEncode(fullyEncoded, /* isObjectName */ true);
+  CHECK(canonical == fullyEncoded);
+}
+
+TEST_CASE("canonicalEncode(): unencoded input should be fully encoded", 
"[AWS][auth][utility]")
+{
+  // When NO encoding is present, canonicalEncode should encode everything
+  const String unencoded = "/app/(channel)/[[parts]]/page.js";
+
+  String canonical = canonicalEncode(unencoded, /* isObjectName */ true);
+  CHECK(canonical == "/app/%28channel%29/%5B%5Bparts%5D%5D/page.js");
+}
+
+/*
+ * Test all S3 "safe" characters that are NOT SigV4 unreserved.
+ * These characters are safe for S3 key names but MUST be encoded for 
signature calculation.
+ * See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
+ */
+TEST_CASE("uriEncode(): S3 safe chars that need SigV4 encoding", 
"[AWS][auth][utility]")
+{
+  // S3 "safe" characters: ! - _ . * ' ( )
+  // SigV4 unreserved:     - _ . ~
+  // Characters that are S3 safe but need SigV4 encoding: ! * ' ( )
+
+  CHECK(uriEncode("!", false) == "%21");
+  CHECK(uriEncode("*", false) == "%2A");
+  CHECK(uriEncode("'", false) == "%27");
+  CHECK(uriEncode("(", false) == "%28");
+  CHECK(uriEncode(")", false) == "%29");
+
+  // Full path test with all these characters
+  const String safeChars = "/bucket/file!name*with'quotes(and)parens.js";
+  String       encoded   = uriEncode(safeChars, /* isObjectName */ true);
+  CHECK(encoded == "/bucket/file%21name%2Awith%27quotes%28and%29parens.js");
+}
+
+TEST_CASE("isUriEncoded(): mixed encoding with other S3 safe chars", 
"[AWS][auth][utility]")
+{
+  // Test mixed encoding with exclamation mark (S3 safe, needs SigV4 encoding)
+  CHECK(false == isUriEncoded("/path/file!name/%20space/", /* isObjectName */ 
true));
+
+  // Test mixed encoding with asterisk
+  CHECK(false == isUriEncoded("/path/file*name/%20space/", /* isObjectName */ 
true));
+
+  // Test mixed encoding with single quote
+  CHECK(false == isUriEncoded("/path/file'name/%20space/", /* isObjectName */ 
true));
+
+  // All properly encoded should return true
+  CHECK(true == isUriEncoded("/path/file%21name/%20space/", /* isObjectName */ 
true));
+  CHECK(true == isUriEncoded("/path/file%2Aname/%20space/", /* isObjectName */ 
true));
+  CHECK(true == isUriEncoded("/path/file%27name/%20space/", /* isObjectName */ 
true));
+}
+
+TEST_CASE("canonicalEncode(): handles all S3 safe chars correctly", 
"[AWS][auth][utility]")
+{
+  // Mixed encoding with exclamation mark
+  String mixed1 = "/path/file!name/%5Btest%5D/";
+  CHECK(canonicalEncode(mixed1, true) == "/path/file%21name/%5Btest%5D/");
+
+  // Mixed encoding with asterisk
+  String mixed2 = "/path/file*name/%5Btest%5D/";
+  CHECK(canonicalEncode(mixed2, true) == "/path/file%2Aname/%5Btest%5D/");
+
+  // Mixed encoding with single quote
+  String mixed3 = "/path/file'name/%5Btest%5D/";
+  CHECK(canonicalEncode(mixed3, true) == "/path/file%27name/%5Btest%5D/");
+
+  // All S3 safe chars unencoded
+  String allSafe = "/bucket/test!file*with'all(chars).js";
+  CHECK(canonicalEncode(allSafe, true) == 
"/bucket/test%21file%2Awith%27all%28chars%29.js");
+}
+
+/*
+ * BUG FIX TESTS: Copilot review identified these issues
+ *
+ * Issue 1: Lowercase hex digits
+ *   AWS SigV4 requires UPPERCASE hex digits in percent-encoding (e.g., %2F 
not %2f).
+ *   URLs with lowercase hex should be normalized to uppercase.
+ *
+ * Issue 2: Plus sign (%2B) handling
+ *   The decode-then-reencode approach may convert %2B (encoded +) to %20 
(space).
+ *   Need to document/verify this matches AWS SigV4 expectations.
+ */
+
+TEST_CASE("isUriEncoded(): lowercase hex digits should NOT be considered fully 
encoded", "[AWS][auth][utility]")
+{
+  // AWS SigV4 requires uppercase hex digits. URLs with lowercase hex need 
normalization.
+  // Example: %2f should become %2F after canonical encoding
+
+  // Lowercase hex - should return false to trigger normalization
+  CHECK(false == isUriEncoded("/path/file%2ftest", /* isObjectName */ true)); 
// lowercase 'f'
+  CHECK(true == isUriEncoded("/path/file%2Ftest", /* isObjectName */ true));  
// uppercase 'F' - this IS properly encoded
+  CHECK(false == isUriEncoded("/path/%5btest%5d", /* isObjectName */ true));  
// lowercase brackets
+  CHECK(true == isUriEncoded("/path/%5Btest%5D", /* isObjectName */ true));   
// uppercase brackets - properly encoded
+
+  // Mixed case - should return false
+  CHECK(false == isUriEncoded("/path/%5btest%5D", /* isObjectName */ true)); 
// mixed: lowercase 'b', uppercase 'D'
+  CHECK(false == isUriEncoded("/path/%5Btest%5d", /* isObjectName */ true)); 
// mixed: uppercase 'B', lowercase 'd'
+}
+
+TEST_CASE("canonicalEncode(): lowercase hex should be normalized to 
uppercase", "[AWS][auth][utility]")
+{
+  // AWS SigV4 requires uppercase hex digits
+  // Input with lowercase hex should be normalized to uppercase
+
+  // Lowercase slash encoding - when isObjectName=true, slashes are allowed 
unencoded
+  // So %2f decodes to / and stays as / (not re-encoded)
+  CHECK(canonicalEncode("/path/file%2ftest/", true) == "/path/file/test/");
+
+  // When isObjectName=false, slashes must be encoded, so %2f becomes %2F
+  CHECK(canonicalEncode("/path/file%2ftest/", false) == 
"%2Fpath%2Ffile%2Ftest%2F");
+
+  // Lowercase bracket encoding - brackets are always encoded
+  CHECK(canonicalEncode("/path/%5btest%5d/file.js", true) == 
"/path/%5Btest%5D/file.js");
+
+  // Mixed case should be normalized
+  CHECK(canonicalEncode("/path/%5btest%5D/file.js", true) == 
"/path/%5Btest%5D/file.js");
+
+  // Already uppercase should pass through
+  CHECK(canonicalEncode("/path/%5Btest%5D/file.js", true) == 
"/path/%5Btest%5D/file.js");
+}
+
+TEST_CASE("uriDecode(): lowercase hex decoding", "[AWS][auth][utility]")
+{
+  // uriDecode should handle both uppercase and lowercase hex
+  CHECK(uriDecode("%2f") == "/");
+  CHECK(uriDecode("%2F") == "/");
+  CHECK(uriDecode("%5b") == "[");
+  CHECK(uriDecode("%5B") == "[");
+  CHECK(uriDecode("/path/%5btest%5d") == "/path/[test]");
+  CHECK(uriDecode("/path/%5Btest%5D") == "/path/[test]");
+}
+
+TEST_CASE("canonicalEncode(): plus sign handling", "[AWS][auth][utility]")
+{
+  // Per AWS SigV4 spec, plus signs in URLs are treated as spaces and encoded 
as %20
+  // This test documents and verifies that behavior
+
+  // Literal plus sign should be encoded as %20 (treated as space per AWS spec)
+  CHECK(uriEncode("+", false) == "%20");

Review Comment:
   The test case name and assertion at line 369 states "Literal plus sign 
should be encoded as %20 (treated as space per AWS spec)" but this behavior 
contradicts standard URL encoding where a literal plus sign would typically be 
encoded as %2B, not %20. While the code comment at lines 365-366 explains this 
is intentional AWS behavior, the test should verify that this matches actual 
AWS SigV4 behavior. Consider adding a reference to the specific AWS 
documentation that states plus signs should be treated as spaces in the 
canonical URI.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
     char c = in[pos];
 
     if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
-      /* found a unreserved character which should not have been be encoded 
regardless
-       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.  */
+      /* Unreserved characters don't need encoding:
+       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE 
hex digits.
+         * If lowercase hex is found, return false to trigger normalization.
+         * See: 
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+         * "Letters in the hexadecimal value must be uppercase, for example 
"%1A"." */
+        if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+          return false; /* Lowercase hex needs normalization to uppercase */
+        }
+        pos += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* Lone '%' or incomplete sequence - needs encoding as %25 */
         return false;
       }
     }
+
+    /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+    return false;
   }
 
-  return false;
+  /* String is already in canonical form:
+   *   - Only contains unreserved chars, slashes (for object names), or
+   *   - Properly percent-encoded sequences with uppercase hex
+   * No decode/re-encode needed. */
+  return true;
 }
 
 String
 canonicalEncode(const String &in, bool isObjectName)
 {
-  String canonical;
-  if (!isUriEncoded(in, isObjectName)) {
-    /* Not URI-encoded */
-    canonical = uriEncode(in, isObjectName);
-  } else {
-    /* URI-encoded, then don't encode since AWS does not encode which is not 
mentioned in the spec,
-     * asked AWS, still waiting for confirmation */
-    canonical = in;
+  if (isUriEncoded(in, isObjectName)) {
+    /* Fully URI-encoded with uppercase hex, return as-is */
+    return in;
   }
 
+  /* Input needs normalization. This handles:
+   *   1. Unencoded strings - encode all reserved characters
+   *   2. Partially encoded - some chars encoded, some not (e.g., parentheses 
vs brackets)
+   *   3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+   *
+   * Decode first to get the raw string, then re-encode with AWS canonical 
rules.
+   *
+   * Example (mixed encoding):
+   *   Input:  /app/(channel)/%5B%5Bparts%5D%5D/page.js  (parentheses not 
encoded)
+   *   Decode: /app/(channel)/[[parts]]/page.js
+   *   Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+   *
+   * Example (lowercase hex):
+   *   Input:  /path/%5btest%5d/file.js  (lowercase hex)
+   *   Decode: /path/[test]/file.js
+   *   Encode: /path/%5Btest%5D/file.js  (uppercase hex)
+   */
+  String decoded   = uriDecode(in);
+  String canonical = uriEncode(decoded, isObjectName);
   return canonical;

Review Comment:
   The canonicalEncode function doesn't handle the case of double-encoded URLs 
(e.g., "%2528" which is an encoded "%28"). When such a URL is decoded once, it 
becomes "%28", which then gets re-encoded to "%2528" again, resulting in the 
original string. However, AWS might expect only a single level of decoding. 
Consider adding test cases for double-encoded URLs to verify the behavior 
matches AWS expectations, or document that double-encoding is not supported.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
     char c = in[pos];
 
     if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
-      /* found a unreserved character which should not have been be encoded 
regardless
-       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.  */
+      /* Unreserved characters don't need encoding:
+       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE 
hex digits.
+         * If lowercase hex is found, return false to trigger normalization.
+         * See: 
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+         * "Letters in the hexadecimal value must be uppercase, for example 
"%1A"." */
+        if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+          return false; /* Lowercase hex needs normalization to uppercase */
+        }
+        pos += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* Lone '%' or incomplete sequence - needs encoding as %25 */
         return false;
       }
     }
+
+    /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+    return false;
   }

Review Comment:
   Using std::isxdigit, std::isalnum, and std::islower with potentially signed 
char values (when char is signed) can cause undefined behavior if the value is 
negative or outside the range of unsigned char. The character being checked 
should be cast to unsigned char before being passed to these functions. This 
applies to multiple locations where in[pos], in[pos + 1], and in[pos + 2] are 
used with these functions.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
     char c = in[pos];
 
     if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
-      /* found a unreserved character which should not have been be encoded 
regardless
-       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.  */
+      /* Unreserved characters don't need encoding:
+       * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE 
hex digits.
+         * If lowercase hex is found, return false to trigger normalization.
+         * See: 
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+         * "Letters in the hexadecimal value must be uppercase, for example 
"%1A"." */
+        if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+          return false; /* Lowercase hex needs normalization to uppercase */
+        }
+        pos += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* Lone '%' or incomplete sequence - needs encoding as %25 */
         return false;
       }
     }
+
+    /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+    return false;
   }
 
-  return false;
+  /* String is already in canonical form:
+   *   - Only contains unreserved chars, slashes (for object names), or
+   *   - Properly percent-encoded sequences with uppercase hex
+   * No decode/re-encode needed. */
+  return true;
 }
 
 String
 canonicalEncode(const String &in, bool isObjectName)
 {
-  String canonical;
-  if (!isUriEncoded(in, isObjectName)) {
-    /* Not URI-encoded */
-    canonical = uriEncode(in, isObjectName);
-  } else {
-    /* URI-encoded, then don't encode since AWS does not encode which is not 
mentioned in the spec,
-     * asked AWS, still waiting for confirmation */
-    canonical = in;
+  if (isUriEncoded(in, isObjectName)) {
+    /* Fully URI-encoded with uppercase hex, return as-is */
+    return in;
   }
 
+  /* Input needs normalization. This handles:
+   *   1. Unencoded strings - encode all reserved characters
+   *   2. Partially encoded - some chars encoded, some not (e.g., parentheses 
vs brackets)
+   *   3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+   *
+   * Decode first to get the raw string, then re-encode with AWS canonical 
rules.
+   *
+   * Example (mixed encoding):
+   *   Input:  /app/(channel)/%5B%5Bparts%5D%5D/page.js  (parentheses not 
encoded)
+   *   Decode: /app/(channel)/[[parts]]/page.js
+   *   Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+   *
+   * Example (lowercase hex):
+   *   Input:  /path/%5btest%5d/file.js  (lowercase hex)
+   *   Decode: /path/[test]/file.js
+   *   Encode: /path/%5Btest%5D/file.js  (uppercase hex)
+   */
+  String decoded   = uriDecode(in);
+  String canonical = uriEncode(decoded, isObjectName);
   return canonical;

Review Comment:
   The decode-then-reencode approach could potentially be exploited if an 
attacker can inject URL-encoded data that normalizes to unexpected values. For 
instance, overlong UTF-8 sequences or other encoding tricks could be used. 
While AWS itself will normalize the URL, it's worth considering if there are 
edge cases where the normalization here could differ from AWS's normalization, 
leading to security issues. Consider adding explicit validation or documenting 
the security assumptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix origin_server_auth URL encoding for mixed-encoding URLs [trafficserver]

Reply via email to