Copilot commented on code in PR #12802:
URL: https://github.com/apache/trafficserver/pull/12802#discussion_r2687387611


##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -142,6 +142,254 @@ TEST_CASE("isUriEncoded(): reserved chars in the input", 
"[AWS][auth][utility]")
   CHECK(true == isUriEncoded(encoded));
 }
 
+/*
+ * BUG: isUriEncoded() returns true if ANY %XX sequence is found, even if other
+ * reserved characters are NOT encoded. This causes canonicalEncode() to skip
+ * encoding, resulting in signature mismatch with S3.
+ *
+ * Example from YTSATS-4835:
+ *   Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *   ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partial decode)

Review Comment:
   The comment states "ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js 
(partial decode)" which is misleading. The client actually sends this 
partially-encoded URL directly - ATS doesn't decode it. The issue is that the 
plugin's signature calculation needs to handle this mixed-encoding input 
correctly.
   
   Consider removing or clarifying the "ATS decodes to" line since it suggests 
ATS is performing a decode operation when it's actually just receiving a 
mixed-encoding URL from the client.
   ```suggestion
    *   ATS receives: /app/(channel)/%5B%5Bparts%5D%5D/page.js  (partially 
encoded path from client)
   ```



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -132,43 +168,68 @@ isUriEncoded(const String &in, bool isObjectName)
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid encoded sequence found, but AWS SigV4 requires UPPERCASE hex 
digits.
+         * If lowercase hex is found, return false to trigger normalization 
via decode/re-encode.
+         * See: 
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+         * "Letters in the hexadecimal value must be uppercase, for example 
"%1A"." */
+        if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+          return false; /* Lowercase hex needs normalization to uppercase */
+        }
+        foundEncodedChar  = true;
+        pos              += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* lonely '%' should have been encoded with %25 according to the RFC */
         return false;
       }
     }
+
+    /* Any other character that reaches here should have been encoded but 
wasn't.
+     * This includes: space, '/', '(', ')', '[', ']', etc.
+     * Return false to indicate the string is not fully encoded. */
+    return false;
   }
 
-  return false;
+  /* If we found at least one encoded character and no unencoded special chars,
+   * the string is fully encoded. If no encoded chars were found, the string
+   * contains only unreserved chars (and possibly '/' for object names). */
+  return foundEncodedChar;
 }
 
 String
 canonicalEncode(const String &in, bool isObjectName)
 {
-  String canonical;
-  if (!isUriEncoded(in, isObjectName)) {
-    /* Not URI-encoded */
-    canonical = uriEncode(in, isObjectName);
-  } else {
-    /* URI-encoded, then don't encode since AWS does not encode which is not 
mentioned in the spec,
-     * asked AWS, still waiting for confirmation */
-    canonical = in;
+  if (isUriEncoded(in, isObjectName)) {
+    /* Fully URI-encoded with uppercase hex, return as-is */
+    return in;
   }
 
+  /* Input needs normalization. This handles:
+   *   1. Unencoded strings - encode all reserved characters
+   *   2. Partially encoded - some chars encoded, some not (e.g., parentheses 
vs brackets)
+   *   3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+   *
+   * Decode first to get the raw string, then re-encode with AWS canonical 
rules.
+   *
+   * Example (mixed encoding):
+   *   Input:  /app/(channel)/%5B%5Bparts%5D%5D/page.js  (parentheses not 
encoded)
+   *   Decode: /app/(channel)/[[parts]]/page.js
+   *   Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+   *
+   * Example (lowercase hex):
+   *   Input:  /path/%5btest%5d/file.js  (lowercase hex)
+   *   Decode: /path/[test]/file.js
+   *   Encode: /path/%5Btest%5D/file.js  (uppercase hex)
+   */
+  String decoded   = uriDecode(in);
+  String canonical = uriEncode(decoded, isObjectName);

Review Comment:
   The decode-then-reencode approach has a semantic bug for lowercase 
hex-encoded plus signs. When the input contains `%2b` (lowercase), the function:
   1. Decodes `%2b` to literal `+` character (byte 0x2B)
   2. Re-encodes `+` as `%20` (per line 125-128 in uriEncode)
   
   This changes the meaning: `%2B` represents a literal plus sign in the object 
name, but `%20` represents a space character. The lowercase `%2b` should be 
normalized to uppercase `%2B`, not converted to `%20`.
   
   Similarly, this affects any character where the AWS SigV4 encoding rules 
differ from standard URI encoding.
   
   Consider a different approach: when lowercase hex digits are found, 
normalize them to uppercase directly without decode/reencode, or add special 
handling in uriEncode to distinguish between a decoded `+` that came from `%2B` 
vs. a literal `+` in the input.



##########
tests/gold_tests/pluginTest/origin_server_auth/s3_url_encoding.test.py:
##########
@@ -0,0 +1,195 @@
+'''
+Test origin_server_auth URL encoding bug (YTSATS-4835)
+
+This test demonstrates a bug where mixed URL encoding in the request path
+causes signature mismatch with S3.
+
+Bug: When a URL has SOME characters encoded (e.g., %5B) but others NOT encoded
+(e.g., parentheses), the isUriEncoded() function incorrectly returns true,
+causing canonicalEncode() to skip re-encoding. This results in a signature
+calculated for a partially-encoded path, which won't match what S3 expects.
+
+Example:
+  Client sends:    /app/(channel)/%5B%5Bparts%5D%5D/page.js
+  isUriEncoded():  Returns true (finds %5B)
+  canonicalEncode(): Returns as-is (thinks already encoded)
+  Signature for:   /app/(channel)/%5B%5Bparts%5D%5D/page.js  <- WRONG
+  S3 expects:      /app/%28channel%29/%5B%5Bparts%5D%5D/page.js  <- CORRECT
+
+This test sends requests with various URL encodings and captures what
+the origin receives, including the Authorization header signature.
+'''
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+Test.testName = "origin_server_auth: URL encoding bug"
+Test.Summary = '''
+Test for S3 signature calculation with mixed URL encoding
+'''
+Test.ContinueOnFail = True
+
+
+class S3UrlEncodingTest:
+    """
+    Test class for verifying URL encoding behavior in S3 auth signature 
calculation.
+    """
+
+    def __init__(self):
+        self.setupOriginServer()
+        self.setupTS()
+
+    def setupOriginServer(self):
+        """
+        Create an origin server that captures and logs the request path and 
headers.
+        ATS preserves URL encoding from the client, so we need to match 
exactly.
+        """
+        self.server = Test.MakeOriginServer("s3_mock")
+
+        # Test case 1: Fully unencoded path with parentheses
+        # Client sends: /bucket/app/(channel)/test.js (unencoded)
+        # ATS forwards: /bucket/app/(channel)/test.js (as-is)
+        request1 = {
+            "headers": "GET /bucket/app/(channel)/test.js HTTP/1.1\r\nHost: 
s3.amazonaws.com\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": ""
+        }
+        response1 = {
+            "headers": "HTTP/1.1 200 OK\r\nConnection: 
close\r\nContent-Length: 7\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": "test1ok"
+        }
+        self.server.addResponse("sessionlog.log", request1, response1)
+
+        # Test case 2: Fully encoded parentheses
+        # Client sends: /bucket/app/%28channel%29/test.js (encoded)
+        # ATS forwards: /bucket/app/%28channel%29/test.js (as-is, preserves 
encoding)
+        request2 = {
+            "headers": "GET /bucket/app/%28channel%29/test.js 
HTTP/1.1\r\nHost: s3.amazonaws.com\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": ""
+        }
+        response2 = {
+            "headers": "HTTP/1.1 200 OK\r\nConnection: 
close\r\nContent-Length: 7\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": "test2ok"
+        }
+        self.server.addResponse("sessionlog.log", request2, response2)
+
+        # Test case 3: Mixed encoding - BUG CASE
+        # Client sends: /bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js
+        #   - Parentheses () NOT encoded
+        #   - Brackets [] ARE encoded as %5B%5D
+        # ATS forwards: /bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js (as-is)
+        #
+        # BUG: isUriEncoded() sees %5B -> returns true -> canonicalEncode() 
skips encoding
+        # Result: Signature calculated for 
/bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js
+        # But S3 expects signature for 
/bucket/app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+        request3 = {
+            "headers": "GET /bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js 
HTTP/1.1\r\nHost: s3.amazonaws.com\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": ""
+        }
+        response3 = {
+            "headers": "HTTP/1.1 200 OK\r\nConnection: 
close\r\nContent-Length: 7\r\n\r\n",
+            "timestamp": "1469733493.993",
+            "body": "test3ok"
+        }
+        self.server.addResponse("sessionlog.log", request3, response3)
+
+    def setupTS(self):
+        """Configure ATS with the origin_server_auth plugin."""
+        self.ts = Test.MakeATSProcess("ts")
+
+        self.ts.Disk.records_config.update(
+            {
+                'proxy.config.diags.debug.enabled': 1,
+                'proxy.config.diags.debug.tags': 'origin_server_auth',
+                'proxy.config.url_remap.pristine_host_hdr': 1,
+            })
+
+        # Copy the config file to the test directory and use it
+        self.ts.Setup.CopyAs('rules/s3_url_encoding.test_input', 
Test.RunDirectory)
+
+        # Remap rule with S3 auth
+        self.ts.Disk.remap_config.AddLine(
+            f'map http://s3.amazonaws.com/ 
http://127.0.0.1:{self.server.Variables.Port}/ '
+            f'@plugin=origin_server_auth.so '
+            f'@pparam=--config 
@pparam={Test.RunDirectory}/s3_url_encoding.test_input')
+
+    def test_unencoded_parentheses(self):
+        """
+        Test 1: Request with unencoded parentheses.
+        The plugin should encode () to %28%29 for the signature calculation.
+        """
+        tr = Test.AddTestRun("Unencoded parentheses in path")
+        tr.Processes.Default.Command = (
+            f'curl -s -v -o /dev/null '
+            f'-H "Host: s3.amazonaws.com" '
+            
f'"http://127.0.0.1:{self.ts.Variables.port}/bucket/app/(channel)/test.js"')
+        tr.Processes.Default.ReturnCode = 0
+        tr.Processes.Default.StartBefore(self.server)
+        tr.Processes.Default.StartBefore(self.ts)
+        tr.Processes.Default.Streams.stderr.Content = 
Testers.ContainsExpression("200 OK", "Expected 200 OK response")
+        tr.StillRunningAfter = self.ts
+
+    def test_encoded_parentheses(self):
+        """
+        Test 2: Request with URL-encoded parentheses.
+        ATS preserves the encoding when forwarding to origin.
+        """
+        tr = Test.AddTestRun("Encoded parentheses in path")
+        tr.Processes.Default.Command = (
+            f'curl -s -v -o /dev/null '
+            f'-H "Host: s3.amazonaws.com" '
+            
f'"http://127.0.0.1:{self.ts.Variables.port}/bucket/app/%28channel%29/test.js";')
+        tr.Processes.Default.ReturnCode = 0
+        tr.Processes.Default.Streams.stderr.Content = 
Testers.ContainsExpression("200 OK", "Expected 200 OK response")
+        tr.StillRunningAfter = self.ts
+
+    def test_mixed_encoding_bug(self):
+        """
+        Test 3: BUG CASE - Mixed encoding with unencoded parentheses and 
encoded brackets.
+
+        Client URL: /bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js
+        - Parentheses () are NOT encoded
+        - Brackets [] ARE encoded as %5B%5D
+
+        BUG: isUriEncoded() finds %5B and returns true, so canonicalEncode()
+        returns the path as-is without encoding the parentheses.
+
+        This causes signature mismatch because:
+        - Plugin calculates signature for: 
/bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js
+        - S3 expects signature for: 
/bucket/app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+        """
+        tr = Test.AddTestRun("Mixed encoding - BUG CASE")
+        tr.Processes.Default.Command = (
+            f'curl -s -v -o /dev/null '
+            f'-H "Host: s3.amazonaws.com" '
+            
f'"http://127.0.0.1:{self.ts.Variables.port}/bucket/app/(channel)/%5B%5Bparts%5D%5D/page.js"')
+        tr.Processes.Default.ReturnCode = 0
+        # This should succeed with our mock server, but would fail with real S3
+        # The test documents the bug - when fixed, the signature would be 
correct
+        tr.Processes.Default.Streams.stderr.Content = 
Testers.ContainsExpression(
+            "200 OK", "Expected 200 OK response (mock server accepts any 
signature)")

Review Comment:
   The comments and docstrings describe the bug that this PR is supposed to 
fix, but should be updated to reflect the fixed behavior. For example:
   
   Lines 7-10 still say "the isUriEncoded() function incorrectly returns true" 
but this should now say "the isUriEncoded() function now correctly returns 
false".
   
   Lines 97-99 still describe the old buggy behavior rather than documenting 
what the test verifies with the fix in place.
   
   Lines 170-175 similarly describe the old bug instead of the expected correct 
behavior after the fix.
   
   Consider updating these comments to describe the correct behavior being 
tested rather than the historical bug.



##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -132,43 +168,68 @@ isUriEncoded(const String &in, bool isObjectName)
       continue;
     }
 
-    if (' ' == c) {
-      /* space should have been encoded with %20 if the string was encoded */
-      return false;
-    }
-
-    if ('/' == c && !isObjectName) {
-      /* if this is not an object name '/' should have been encoded */
-      return false;
+    if ('/' == c && isObjectName) {
+      /* '/' is allowed unencoded in object names */
+      continue;
     }
 
     if ('%' == c) {
       if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) && 
std::isxdigit(in[pos + 2])) {
-        /* if string was encoded we should have exactly 2 hexadecimal chars 
following it */
-        return true;
+        /* Valid encoded sequence found, but AWS SigV4 requires UPPERCASE hex 
digits.
+         * If lowercase hex is found, return false to trigger normalization 
via decode/re-encode.
+         * See: 
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+         * "Letters in the hexadecimal value must be uppercase, for example 
"%1A"." */
+        if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+          return false; /* Lowercase hex needs normalization to uppercase */
+        }
+        foundEncodedChar  = true;
+        pos              += 2; /* Skip past the hex digits */
+        continue;
       } else {
-        /* lonely '%' should have been encoded with %25 according to the RFC 
so likely not encoded */
+        /* lonely '%' should have been encoded with %25 according to the RFC */
         return false;
       }
     }
+
+    /* Any other character that reaches here should have been encoded but 
wasn't.
+     * This includes: space, '/', '(', ')', '[', ']', etc.
+     * Return false to indicate the string is not fully encoded. */
+    return false;
   }
 
-  return false;
+  /* If we found at least one encoded character and no unencoded special chars,
+   * the string is fully encoded. If no encoded chars were found, the string
+   * contains only unreserved chars (and possibly '/' for object names). */
+  return foundEncodedChar;

Review Comment:
   The function returns `false` for strings containing only unreserved 
characters (and '/' for object names). This causes `canonicalEncode()` to 
perform unnecessary decode/re-encode operations even when the string is already 
in canonical form.
   
   For example, a simple path like "/bucket/test.txt" with `isObjectName=true` 
would return `false`, triggering decode/re-encode operations that produce the 
same output.
   
   Consider returning `true` when the string contains only unreserved 
characters (which are already in canonical form), or add a separate flag to 
track whether any non-unreserved characters were encountered. This would 
optimize the common case of paths that don't need any encoding.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to