Hi Ignace,
> upon further inspection and verification of RFC3986 I also see an issue > with the example used for normalization in the RFC. According to RFC3986 ( > https://www.rfc-editor.org/rfc/rfc3986.html#section-3.2.2) : > > The reg-name syntax allows percent-encoded octets in order to > represent non-ASCII registered names in a uniform way that is > independent of the underlying name resolution technology. Non-ASCII > characters must first be encoded according to UTF-8 [STD63 > <https://www.rfc-editor.org/rfc/rfc3986.html#ref-STD63>], and then > each octet of the corresponding UTF-8 sequence must be percent- > encoded to be represented as URI characters. URI producing > applications must not use percent-encoding in host unless it is used > to represent a UTF-8 character sequence. When a non-ASCII registered > name represents an internationalized domain name intended for > resolution via the DNS, the name must be transformed to the IDNA > encoding [RFC3490 <https://www.rfc-editor.org/rfc/rfc3490>] prior to name > lookup. > > From this we can infer that: > > - Host encoding can only happen for UTF-8 sequence but in your example "ex% > 61mple.com" is used which is not conforming to the rules (ie it should > throw an InvalidUriException IMHO for the Uri class) I presume for WhatWg > URL it will get correctly converted with a soft error (??). > Oh, that's a very interesting catch again. If your interpretation is correct, then I think it must also be some bug with the parser library, but I have to dig into the code first, or reach out to its author. :) I have some suspicion though that the "URI producing applications" part may not apply for this case, at least I have a hard-time to decide what this expression really means. The RFC also uses "URI reference parsers" that is really a straightforward name, while "URI producers" isn't. For example, there is a paragraph in the RFC: > URI producers and normalizers should omit the ":" delimiter that separates host from port if the port component is empty. Some schemes do not allow the userinfo and/or port subcomponents. Clearly, omitting ":" is not done during parse-time, but when a URI (reference) is produced. So I find it possible that "URI producing" mean when the URI string is created, not when the URI is parsed, although the RFC usually uses URI and URI reference consistently. So I'm not sure. Maybe it's a typo, and it should have been "URI normalizers". Regards, Máté