Copilot commented on code in PR #12802:
URL: https://github.com/apache/trafficserver/pull/12802#discussion_r2687422452
##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -70,7 +70,8 @@ TEST_CASE("uriEncode(): encode reserved chars in an object
name", "[AWS][auth][u
TEST_CASE("isUriEncoded(): check an empty input", "[AWS][auth][utility]")
{
- CHECK(false == isUriEncoded(""));
+ // Empty string has no characters that need encoding - it's already canonical
+ CHECK(true == isUriEncoded(""));
Review Comment:
The behavior change for empty strings is semantically reasonable (an empty
string is already canonical), but this changes the return value from false to
true for empty input. While the test has been updated to reflect this, ensure
that all calling code handles this correctly, especially since the PR
description doesn't explicitly mention this behavioral change for empty strings.
##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -66,6 +66,34 @@ base16Encode(const char *in, size_t inLen)
return result.str();
}
+/**
+ * @brief URI-decode a character string
+ *
+ * Decodes percent-encoded characters (e.g., %20 -> space, %2F -> /)
+ *
+ * @param in string to be URI decoded
+ * @return decoded string
+ */
+String
+uriDecode(const String &in)
+{
+ String result;
+ result.reserve(in.length()); /* Decoded string will be same size or smaller
*/
+
+ for (size_t i = 0; i < in.length(); i++) {
+ if (in[i] == '%' && i + 2 < in.length() && std::isxdigit(in[i + 1]) &&
std::isxdigit(in[i + 2])) {
Review Comment:
Using std::isxdigit with potentially signed char values (when char is
signed) can cause undefined behavior if the value is negative. The arguments to
isxdigit should be cast to unsigned char to ensure values in the range 0-255
are correctly handled. This applies to all three calls: in[i + 1] and in[i + 2].
```suggestion
if (in[i] == '%' && i + 2 < in.length() &&
std::isxdigit(static_cast<unsigned char>(in[i + 1])) &&
std::isxdigit(static_cast<unsigned char>(in[i + 2]))) {
```
##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -66,6 +66,34 @@ base16Encode(const char *in, size_t inLen)
return result.str();
}
+/**
+ * @brief URI-decode a character string
+ *
+ * Decodes percent-encoded characters (e.g., %20 -> space, %2F -> /)
+ *
+ * @param in string to be URI decoded
+ * @return decoded string
+ */
+String
+uriDecode(const String &in)
+{
+ String result;
+ result.reserve(in.length()); /* Decoded string will be same size or smaller
*/
+
+ for (size_t i = 0; i < in.length(); i++) {
+ if (in[i] == '%' && i + 2 < in.length() && std::isxdigit(in[i + 1]) &&
std::isxdigit(in[i + 2])) {
+ /* Decode %XX to character */
+ char hex[3] = {in[i + 1], in[i + 2], '\0'};
+ result += static_cast<char>(std::strtol(hex, nullptr, 16));
+ i += 2; /* Skip past the hex digits */
+ } else {
+ result += in[i];
+ }
+ }
+
+ return result;
+}
Review Comment:
The uriDecode function does not handle invalid UTF-8 sequences or characters
outside the printable ASCII range. When decoding percent-encoded bytes that
represent non-ASCII or invalid UTF-8 sequences, the function may produce
strings with invalid characters. Consider adding validation or documenting this
limitation, especially since this is used in a security-sensitive context for
AWS signature calculation.
##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -142,6 +149,254 @@ TEST_CASE("isUriEncoded(): reserved chars in the input",
"[AWS][auth][utility]")
CHECK(true == isUriEncoded(encoded));
}
+/*
+ * BUG: isUriEncoded() returns true if ANY %XX sequence is found, even if other
+ * reserved characters are NOT encoded. This causes canonicalEncode() to skip
+ * encoding, resulting in signature mismatch with S3.
+ *
+ * Example from YTSATS-4835:
+ * Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ * ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js (partial decode)
Review Comment:
The comment on line 159 describes a scenario where "ATS decodes to:
/app/(channel)/%5B%5Bparts%5D%5D/page.js (partial decode)". However, this seems
to imply that ATS performs partial decoding of the URL, which may not be
accurate. According to the test setup and description, ATS preserves URL
encoding from the client and forwards it as-is. The comment should clarify
whether this partial decode is actually performed by ATS or if it's just
describing what a hypothetical partial decode would look like.
```suggestion
* Partially decoded form involved in the bug:
/app/(channel)/%5B%5Bparts%5D%5D/page.js (partial decode example)
* (ATS preserves the client’s URL encoding and forwards it as-is; this
line illustrates the mixed-encoding state)
```
##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
char c = in[pos];
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
- /* found a unreserved character which should not have been be encoded
regardless
- * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. */
+ /* Unreserved characters don't need encoding:
+ * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
continue;
}
- if (' ' == c) {
- /* space should have been encoded with %20 if the string was encoded */
- return false;
- }
-
- if ('/' == c && !isObjectName) {
- /* if this is not an object name '/' should have been encoded */
- return false;
+ if ('/' == c && isObjectName) {
+ /* '/' is allowed unencoded in object names */
+ continue;
}
if ('%' == c) {
if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) &&
std::isxdigit(in[pos + 2])) {
- /* if string was encoded we should have exactly 2 hexadecimal chars
following it */
- return true;
+ /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE
hex digits.
+ * If lowercase hex is found, return false to trigger normalization.
+ * See:
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+ * "Letters in the hexadecimal value must be uppercase, for example
"%1A"." */
+ if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+ return false; /* Lowercase hex needs normalization to uppercase */
+ }
+ pos += 2; /* Skip past the hex digits */
+ continue;
} else {
- /* lonely '%' should have been encoded with %25 according to the RFC
so likely not encoded */
+ /* Lone '%' or incomplete sequence - needs encoding as %25 */
return false;
}
}
+
+ /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+ return false;
}
- return false;
+ /* String is already in canonical form:
+ * - Only contains unreserved chars, slashes (for object names), or
+ * - Properly percent-encoded sequences with uppercase hex
+ * No decode/re-encode needed. */
+ return true;
}
String
canonicalEncode(const String &in, bool isObjectName)
{
- String canonical;
- if (!isUriEncoded(in, isObjectName)) {
- /* Not URI-encoded */
- canonical = uriEncode(in, isObjectName);
- } else {
- /* URI-encoded, then don't encode since AWS does not encode which is not
mentioned in the spec,
- * asked AWS, still waiting for confirmation */
- canonical = in;
+ if (isUriEncoded(in, isObjectName)) {
+ /* Fully URI-encoded with uppercase hex, return as-is */
+ return in;
}
+ /* Input needs normalization. This handles:
+ * 1. Unencoded strings - encode all reserved characters
+ * 2. Partially encoded - some chars encoded, some not (e.g., parentheses
vs brackets)
+ * 3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+ *
+ * Decode first to get the raw string, then re-encode with AWS canonical
rules.
+ *
+ * Example (mixed encoding):
+ * Input: /app/(channel)/%5B%5Bparts%5D%5D/page.js (parentheses not
encoded)
+ * Decode: /app/(channel)/[[parts]]/page.js
+ * Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *
+ * Example (lowercase hex):
+ * Input: /path/%5btest%5d/file.js (lowercase hex)
+ * Decode: /path/[test]/file.js
+ * Encode: /path/%5Btest%5D/file.js (uppercase hex)
+ */
+ String decoded = uriDecode(in);
+ String canonical = uriEncode(decoded, isObjectName);
return canonical;
Review Comment:
The decode-then-reencode approach allocates two temporary strings for every
URL that needs normalization. For high-traffic scenarios with many requests
containing mixed encoding, this could have performance implications. Consider
adding a fast-path optimization that checks if the string needs normalization
before allocating the decode buffer, or document the performance
characteristics of this approach.
##########
plugins/origin_server_auth/unit_tests/test_aws_auth_v4.cc:
##########
@@ -142,6 +149,254 @@ TEST_CASE("isUriEncoded(): reserved chars in the input",
"[AWS][auth][utility]")
CHECK(true == isUriEncoded(encoded));
}
+/*
+ * BUG: isUriEncoded() returns true if ANY %XX sequence is found, even if other
+ * reserved characters are NOT encoded. This causes canonicalEncode() to skip
+ * encoding, resulting in signature mismatch with S3.
+ *
+ * Example from YTSATS-4835:
+ * Client sends: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ * ATS decodes to: /app/(channel)/%5B%5Bparts%5D%5D/page.js (partial decode)
+ * isUriEncoded() sees %5B -> returns true (incorrectly assumes fully
encoded)
+ * canonicalEncode() returns as-is: /app/(channel)/%5B%5Bparts%5D%5D/page.js
+ * Signature calculated for partially-encoded path
+ * S3 expects signature for: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ * Result: 403 signature mismatch
+ */
+TEST_CASE("isUriEncoded(): mixed encoding with unencoded parentheses and
encoded brackets", "[AWS][auth][utility]")
+{
+ // Path with parentheses NOT encoded but brackets ARE encoded
+ // This is the bug case from YTSATS-4835
+ const String mixedEncoding = "/app/(channel)/%5B%5Bparts%5D%5D/page.js";
+
+ // Fixed behavior: returns false because parentheses () are not encoded
+ // Even though %5B is present, the string is only PARTIALLY encoded
+ CHECK(false == isUriEncoded(mixedEncoding, /* isObjectName */ true));
+}
+
+TEST_CASE("isUriEncoded(): unencoded parentheses should indicate not fully
encoded", "[AWS][auth][utility]")
+{
+ // Parentheses are reserved characters per AWS spec and should be encoded
+ // If we see unencoded parentheses, the string is NOT properly URI-encoded
+ const String withParens = "/app/(channel)/test.js";
+
+ // Returns false (no encoded chars found, string needs encoding)
+ CHECK(false == isUriEncoded(withParens, /* isObjectName */ true));
+
+ // After encoding, parentheses become %28 and %29
+ String encoded = uriEncode(withParens, /* isObjectName */ true);
+ CHECK(encoded == "/app/%28channel%29/test.js");
+}
+
+TEST_CASE("uriDecode(): decode percent-encoded characters",
"[AWS][auth][utility]")
+{
+ // Test basic decoding
+ CHECK(uriDecode("%28") == "(");
+ CHECK(uriDecode("%29") == ")");
+ CHECK(uriDecode("%5B") == "[");
+ CHECK(uriDecode("%5D") == "]");
+ CHECK(uriDecode("%20") == " ");
+
+ // Test mixed content
+ CHECK(uriDecode("/app/%28channel%29/test.js") == "/app/(channel)/test.js");
+ CHECK(uriDecode("/app/%5B%5Bparts%5D%5D/page.js") ==
"/app/[[parts]]/page.js");
+
+ // Test passthrough of non-encoded content
+ CHECK(uriDecode("/app/test.js") == "/app/test.js");
+ CHECK(uriDecode("hello-world_123.txt") == "hello-world_123.txt");
+
+ // Test mixed encoded and non-encoded (the bug case)
+ CHECK(uriDecode("/app/(channel)/%5B%5Bparts%5D%5D/page.js") ==
"/app/(channel)/[[parts]]/page.js");
+}
+
+TEST_CASE("canonicalEncode(): mixed encoding produces correct canonical URI",
"[AWS][auth][utility]")
+{
+ // Fixed behavior: canonicalEncode() now decodes first, then re-encodes
+ // This ensures consistent canonical output regardless of input encoding
+
+ const String mixedEncoding = "/app/(channel)/%5B%5Bparts%5D%5D/page.js";
+
+ String canonical = canonicalEncode(mixedEncoding, /* isObjectName */ true);
+
+ // Correct behavior: ALL reserved characters are encoded
+ CHECK(canonical == "/app/%28channel%29/%5B%5Bparts%5D%5D/page.js");
+}
+
+TEST_CASE("canonicalEncode(): fully encoded input should pass through
unchanged", "[AWS][auth][utility]")
+{
+ // When ALL reserved chars are properly encoded, canonicalEncode should
return as-is
+ const String fullyEncoded = "/app/%28channel%29/%5B%5Bparts%5D%5D/page.js";
+
+ String canonical = canonicalEncode(fullyEncoded, /* isObjectName */ true);
+ CHECK(canonical == fullyEncoded);
+}
+
+TEST_CASE("canonicalEncode(): unencoded input should be fully encoded",
"[AWS][auth][utility]")
+{
+ // When NO encoding is present, canonicalEncode should encode everything
+ const String unencoded = "/app/(channel)/[[parts]]/page.js";
+
+ String canonical = canonicalEncode(unencoded, /* isObjectName */ true);
+ CHECK(canonical == "/app/%28channel%29/%5B%5Bparts%5D%5D/page.js");
+}
+
+/*
+ * Test all S3 "safe" characters that are NOT SigV4 unreserved.
+ * These characters are safe for S3 key names but MUST be encoded for
signature calculation.
+ * See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
+ */
+TEST_CASE("uriEncode(): S3 safe chars that need SigV4 encoding",
"[AWS][auth][utility]")
+{
+ // S3 "safe" characters: ! - _ . * ' ( )
+ // SigV4 unreserved: - _ . ~
+ // Characters that are S3 safe but need SigV4 encoding: ! * ' ( )
+
+ CHECK(uriEncode("!", false) == "%21");
+ CHECK(uriEncode("*", false) == "%2A");
+ CHECK(uriEncode("'", false) == "%27");
+ CHECK(uriEncode("(", false) == "%28");
+ CHECK(uriEncode(")", false) == "%29");
+
+ // Full path test with all these characters
+ const String safeChars = "/bucket/file!name*with'quotes(and)parens.js";
+ String encoded = uriEncode(safeChars, /* isObjectName */ true);
+ CHECK(encoded == "/bucket/file%21name%2Awith%27quotes%28and%29parens.js");
+}
+
+TEST_CASE("isUriEncoded(): mixed encoding with other S3 safe chars",
"[AWS][auth][utility]")
+{
+ // Test mixed encoding with exclamation mark (S3 safe, needs SigV4 encoding)
+ CHECK(false == isUriEncoded("/path/file!name/%20space/", /* isObjectName */
true));
+
+ // Test mixed encoding with asterisk
+ CHECK(false == isUriEncoded("/path/file*name/%20space/", /* isObjectName */
true));
+
+ // Test mixed encoding with single quote
+ CHECK(false == isUriEncoded("/path/file'name/%20space/", /* isObjectName */
true));
+
+ // All properly encoded should return true
+ CHECK(true == isUriEncoded("/path/file%21name/%20space/", /* isObjectName */
true));
+ CHECK(true == isUriEncoded("/path/file%2Aname/%20space/", /* isObjectName */
true));
+ CHECK(true == isUriEncoded("/path/file%27name/%20space/", /* isObjectName */
true));
+}
+
+TEST_CASE("canonicalEncode(): handles all S3 safe chars correctly",
"[AWS][auth][utility]")
+{
+ // Mixed encoding with exclamation mark
+ String mixed1 = "/path/file!name/%5Btest%5D/";
+ CHECK(canonicalEncode(mixed1, true) == "/path/file%21name/%5Btest%5D/");
+
+ // Mixed encoding with asterisk
+ String mixed2 = "/path/file*name/%5Btest%5D/";
+ CHECK(canonicalEncode(mixed2, true) == "/path/file%2Aname/%5Btest%5D/");
+
+ // Mixed encoding with single quote
+ String mixed3 = "/path/file'name/%5Btest%5D/";
+ CHECK(canonicalEncode(mixed3, true) == "/path/file%27name/%5Btest%5D/");
+
+ // All S3 safe chars unencoded
+ String allSafe = "/bucket/test!file*with'all(chars).js";
+ CHECK(canonicalEncode(allSafe, true) ==
"/bucket/test%21file%2Awith%27all%28chars%29.js");
+}
+
+/*
+ * BUG FIX TESTS: Copilot review identified these issues
+ *
+ * Issue 1: Lowercase hex digits
+ * AWS SigV4 requires UPPERCASE hex digits in percent-encoding (e.g., %2F
not %2f).
+ * URLs with lowercase hex should be normalized to uppercase.
+ *
+ * Issue 2: Plus sign (%2B) handling
+ * The decode-then-reencode approach may convert %2B (encoded +) to %20
(space).
+ * Need to document/verify this matches AWS SigV4 expectations.
+ */
+
+TEST_CASE("isUriEncoded(): lowercase hex digits should NOT be considered fully
encoded", "[AWS][auth][utility]")
+{
+ // AWS SigV4 requires uppercase hex digits. URLs with lowercase hex need
normalization.
+ // Example: %2f should become %2F after canonical encoding
+
+ // Lowercase hex - should return false to trigger normalization
+ CHECK(false == isUriEncoded("/path/file%2ftest", /* isObjectName */ true));
// lowercase 'f'
+ CHECK(true == isUriEncoded("/path/file%2Ftest", /* isObjectName */ true));
// uppercase 'F' - this IS properly encoded
+ CHECK(false == isUriEncoded("/path/%5btest%5d", /* isObjectName */ true));
// lowercase brackets
+ CHECK(true == isUriEncoded("/path/%5Btest%5D", /* isObjectName */ true));
// uppercase brackets - properly encoded
+
+ // Mixed case - should return false
+ CHECK(false == isUriEncoded("/path/%5btest%5D", /* isObjectName */ true));
// mixed: lowercase 'b', uppercase 'D'
+ CHECK(false == isUriEncoded("/path/%5Btest%5d", /* isObjectName */ true));
// mixed: uppercase 'B', lowercase 'd'
+}
+
+TEST_CASE("canonicalEncode(): lowercase hex should be normalized to
uppercase", "[AWS][auth][utility]")
+{
+ // AWS SigV4 requires uppercase hex digits
+ // Input with lowercase hex should be normalized to uppercase
+
+ // Lowercase slash encoding - when isObjectName=true, slashes are allowed
unencoded
+ // So %2f decodes to / and stays as / (not re-encoded)
+ CHECK(canonicalEncode("/path/file%2ftest/", true) == "/path/file/test/");
+
+ // When isObjectName=false, slashes must be encoded, so %2f becomes %2F
+ CHECK(canonicalEncode("/path/file%2ftest/", false) ==
"%2Fpath%2Ffile%2Ftest%2F");
+
+ // Lowercase bracket encoding - brackets are always encoded
+ CHECK(canonicalEncode("/path/%5btest%5d/file.js", true) ==
"/path/%5Btest%5D/file.js");
+
+ // Mixed case should be normalized
+ CHECK(canonicalEncode("/path/%5btest%5D/file.js", true) ==
"/path/%5Btest%5D/file.js");
+
+ // Already uppercase should pass through
+ CHECK(canonicalEncode("/path/%5Btest%5D/file.js", true) ==
"/path/%5Btest%5D/file.js");
+}
+
+TEST_CASE("uriDecode(): lowercase hex decoding", "[AWS][auth][utility]")
+{
+ // uriDecode should handle both uppercase and lowercase hex
+ CHECK(uriDecode("%2f") == "/");
+ CHECK(uriDecode("%2F") == "/");
+ CHECK(uriDecode("%5b") == "[");
+ CHECK(uriDecode("%5B") == "[");
+ CHECK(uriDecode("/path/%5btest%5d") == "/path/[test]");
+ CHECK(uriDecode("/path/%5Btest%5D") == "/path/[test]");
+}
+
+TEST_CASE("canonicalEncode(): plus sign handling", "[AWS][auth][utility]")
+{
+ // Per AWS SigV4 spec, plus signs in URLs are treated as spaces and encoded
as %20
+ // This test documents and verifies that behavior
+
+ // Literal plus sign should be encoded as %20 (treated as space per AWS spec)
+ CHECK(uriEncode("+", false) == "%20");
Review Comment:
The test case name and assertion at line 369 states "Literal plus sign
should be encoded as %20 (treated as space per AWS spec)" but this behavior
contradicts standard URL encoding where a literal plus sign would typically be
encoded as %2B, not %20. While the code comment at lines 365-366 explains this
is intentional AWS behavior, the test should verify that this matches actual
AWS SigV4 behavior. Consider adding a reference to the specific AWS
documentation that states plus signs should be treated as spaces in the
canonical URI.
##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
char c = in[pos];
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
- /* found a unreserved character which should not have been be encoded
regardless
- * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. */
+ /* Unreserved characters don't need encoding:
+ * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
continue;
}
- if (' ' == c) {
- /* space should have been encoded with %20 if the string was encoded */
- return false;
- }
-
- if ('/' == c && !isObjectName) {
- /* if this is not an object name '/' should have been encoded */
- return false;
+ if ('/' == c && isObjectName) {
+ /* '/' is allowed unencoded in object names */
+ continue;
}
if ('%' == c) {
if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) &&
std::isxdigit(in[pos + 2])) {
- /* if string was encoded we should have exactly 2 hexadecimal chars
following it */
- return true;
+ /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE
hex digits.
+ * If lowercase hex is found, return false to trigger normalization.
+ * See:
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+ * "Letters in the hexadecimal value must be uppercase, for example
"%1A"." */
+ if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+ return false; /* Lowercase hex needs normalization to uppercase */
+ }
+ pos += 2; /* Skip past the hex digits */
+ continue;
} else {
- /* lonely '%' should have been encoded with %25 according to the RFC
so likely not encoded */
+ /* Lone '%' or incomplete sequence - needs encoding as %25 */
return false;
}
}
+
+ /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+ return false;
}
- return false;
+ /* String is already in canonical form:
+ * - Only contains unreserved chars, slashes (for object names), or
+ * - Properly percent-encoded sequences with uppercase hex
+ * No decode/re-encode needed. */
+ return true;
}
String
canonicalEncode(const String &in, bool isObjectName)
{
- String canonical;
- if (!isUriEncoded(in, isObjectName)) {
- /* Not URI-encoded */
- canonical = uriEncode(in, isObjectName);
- } else {
- /* URI-encoded, then don't encode since AWS does not encode which is not
mentioned in the spec,
- * asked AWS, still waiting for confirmation */
- canonical = in;
+ if (isUriEncoded(in, isObjectName)) {
+ /* Fully URI-encoded with uppercase hex, return as-is */
+ return in;
}
+ /* Input needs normalization. This handles:
+ * 1. Unencoded strings - encode all reserved characters
+ * 2. Partially encoded - some chars encoded, some not (e.g., parentheses
vs brackets)
+ * 3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+ *
+ * Decode first to get the raw string, then re-encode with AWS canonical
rules.
+ *
+ * Example (mixed encoding):
+ * Input: /app/(channel)/%5B%5Bparts%5D%5D/page.js (parentheses not
encoded)
+ * Decode: /app/(channel)/[[parts]]/page.js
+ * Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *
+ * Example (lowercase hex):
+ * Input: /path/%5btest%5d/file.js (lowercase hex)
+ * Decode: /path/[test]/file.js
+ * Encode: /path/%5Btest%5D/file.js (uppercase hex)
+ */
+ String decoded = uriDecode(in);
+ String canonical = uriEncode(decoded, isObjectName);
return canonical;
Review Comment:
The canonicalEncode function doesn't handle the case of double-encoded URLs
(e.g., "%2528" which is an encoded "%28"). When such a URL is decoded once, it
becomes "%28", which then gets re-encoded to "%2528" again, resulting in the
original string. However, AWS might expect only a single level of decoding.
Consider adding test cases for double-encoded URLs to verify the behavior
matches AWS expectations, or document that double-encoding is not supported.
##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
char c = in[pos];
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
- /* found a unreserved character which should not have been be encoded
regardless
- * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. */
+ /* Unreserved characters don't need encoding:
+ * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
continue;
}
- if (' ' == c) {
- /* space should have been encoded with %20 if the string was encoded */
- return false;
- }
-
- if ('/' == c && !isObjectName) {
- /* if this is not an object name '/' should have been encoded */
- return false;
+ if ('/' == c && isObjectName) {
+ /* '/' is allowed unencoded in object names */
+ continue;
}
if ('%' == c) {
if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) &&
std::isxdigit(in[pos + 2])) {
- /* if string was encoded we should have exactly 2 hexadecimal chars
following it */
- return true;
+ /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE
hex digits.
+ * If lowercase hex is found, return false to trigger normalization.
+ * See:
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+ * "Letters in the hexadecimal value must be uppercase, for example
"%1A"." */
+ if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+ return false; /* Lowercase hex needs normalization to uppercase */
+ }
+ pos += 2; /* Skip past the hex digits */
+ continue;
} else {
- /* lonely '%' should have been encoded with %25 according to the RFC
so likely not encoded */
+ /* Lone '%' or incomplete sequence - needs encoding as %25 */
return false;
}
}
+
+ /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+ return false;
}
Review Comment:
Using std::isxdigit, std::isalnum, and std::islower with potentially signed
char values (when char is signed) can cause undefined behavior if the value is
negative or outside the range of unsigned char. The character being checked
should be cast to unsigned char before being passed to these functions. This
applies to multiple locations where in[pos], in[pos + 1], and in[pos + 2] are
used with these functions.
##########
plugins/origin_server_auth/aws_auth_v4.cc:
##########
@@ -127,48 +161,71 @@ isUriEncoded(const String &in, bool isObjectName)
char c = in[pos];
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
- /* found a unreserved character which should not have been be encoded
regardless
- * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. */
+ /* Unreserved characters don't need encoding:
+ * 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~' */
continue;
}
- if (' ' == c) {
- /* space should have been encoded with %20 if the string was encoded */
- return false;
- }
-
- if ('/' == c && !isObjectName) {
- /* if this is not an object name '/' should have been encoded */
- return false;
+ if ('/' == c && isObjectName) {
+ /* '/' is allowed unencoded in object names */
+ continue;
}
if ('%' == c) {
if (pos + 2 < in.length() && std::isxdigit(in[pos + 1]) &&
std::isxdigit(in[pos + 2])) {
- /* if string was encoded we should have exactly 2 hexadecimal chars
following it */
- return true;
+ /* Valid percent-encoded sequence found. AWS SigV4 requires UPPERCASE
hex digits.
+ * If lowercase hex is found, return false to trigger normalization.
+ * See:
https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
+ * "Letters in the hexadecimal value must be uppercase, for example
"%1A"." */
+ if (std::islower(in[pos + 1]) || std::islower(in[pos + 2])) {
+ return false; /* Lowercase hex needs normalization to uppercase */
+ }
+ pos += 2; /* Skip past the hex digits */
+ continue;
} else {
- /* lonely '%' should have been encoded with %25 according to the RFC
so likely not encoded */
+ /* Lone '%' or incomplete sequence - needs encoding as %25 */
return false;
}
}
+
+ /* Any other character needs encoding (space, '(', ')', '[', ']', etc.) */
+ return false;
}
- return false;
+ /* String is already in canonical form:
+ * - Only contains unreserved chars, slashes (for object names), or
+ * - Properly percent-encoded sequences with uppercase hex
+ * No decode/re-encode needed. */
+ return true;
}
String
canonicalEncode(const String &in, bool isObjectName)
{
- String canonical;
- if (!isUriEncoded(in, isObjectName)) {
- /* Not URI-encoded */
- canonical = uriEncode(in, isObjectName);
- } else {
- /* URI-encoded, then don't encode since AWS does not encode which is not
mentioned in the spec,
- * asked AWS, still waiting for confirmation */
- canonical = in;
+ if (isUriEncoded(in, isObjectName)) {
+ /* Fully URI-encoded with uppercase hex, return as-is */
+ return in;
}
+ /* Input needs normalization. This handles:
+ * 1. Unencoded strings - encode all reserved characters
+ * 2. Partially encoded - some chars encoded, some not (e.g., parentheses
vs brackets)
+ * 3. Lowercase hex - convert %2f to %2F per AWS SigV4 requirement
+ *
+ * Decode first to get the raw string, then re-encode with AWS canonical
rules.
+ *
+ * Example (mixed encoding):
+ * Input: /app/(channel)/%5B%5Bparts%5D%5D/page.js (parentheses not
encoded)
+ * Decode: /app/(channel)/[[parts]]/page.js
+ * Encode: /app/%28channel%29/%5B%5Bparts%5D%5D/page.js
+ *
+ * Example (lowercase hex):
+ * Input: /path/%5btest%5d/file.js (lowercase hex)
+ * Decode: /path/[test]/file.js
+ * Encode: /path/%5Btest%5D/file.js (uppercase hex)
+ */
+ String decoded = uriDecode(in);
+ String canonical = uriEncode(decoded, isObjectName);
return canonical;
Review Comment:
The decode-then-reencode approach could potentially be exploited if an
attacker can inject URL-encoded data that normalizes to unexpected values. For
instance, overlong UTF-8 sequences or other encoding tricks could be used.
While AWS itself will normalize the URL, it's worth considering if there are
edge cases where the normalization here could differ from AWS's normalization,
leading to security issues. Consider adding explicit validation or documenting
the security assumptions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]