[Issue 15382] std.uri has an incorrect set of reserved characters

d-bugmail--- via Digitalmars-d-bugs Sun, 24 Jan 2021 14:44:32 -0800

https://issues.dlang.org/show_bug.cgi?id=15382


Stefan <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
                 CC|                            |[email protected]
         Resolution|INVALID                     |---

--- Comment #3 from Stefan <[email protected]> ---
According to § 2.2 of RFC 3986 there are the following character
classes:

   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

The code in phobos/std/uri.d references these character classes
instead:

     62         uflags['#'] |= URI_Hash;
     66             uflags[c] |= URI_Alpha;
     67             uflags[c + 0x20] |= URI_Alpha;   // lowercase letters
     69         foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit;
     70         foreach (c; ";/?:@&=+$,")   uflags[c] |= URI_Reserved;
     71         foreach (c; "-_.!~*'()")    uflags[c] |= URI_Mark;

If encodeComponent is used URI_Encode is invoked with 
unescapedSet = URI_Alpha | URI_Digit | URI_Mark. This leads to
some reserved characters not beeing encoded, e.g. ! or (.

The notion of mark characters stems from the obsoleted RFC 2396 [2].
RFC 3986 explains the changes in its Appendix D.2 [3].

[1] https://tools.ietf.org/html/rfc3986#section-2
[2] https://tools.ietf.org/html/rfc2396#section-2.3
[3] https://tools.ietf.org/html/rfc3986#appendix-D.2

--

[Issue 15382] std.uri has an incorrect set of reserved characters

Reply via email to