Hi Ignace,

>  upon further inspection and verification of RFC3986 I also see an issue
> with the example used for normalization in the RFC. According to RFC3986 (
> https://www.rfc-editor.org/rfc/rfc3986.html#section-3.2.2) :
>
>  The reg-name syntax allows percent-encoded octets in order to
>   represent non-ASCII registered names in a uniform way that is
>    independent of the underlying name resolution technology.  Non-ASCII
>    characters must first be encoded according to UTF-8 [STD63 
> <https://www.rfc-editor.org/rfc/rfc3986.html#ref-STD63>], and then
>    each octet of the corresponding UTF-8 sequence must be percent-
>    encoded to be represented as URI characters.  URI producing
>    applications must not use percent-encoding in host unless it is used
>    to represent a UTF-8 character sequence.  When a non-ASCII registered
>    name represents an internationalized domain name intended for
>    resolution via the DNS, the name must be transformed to the IDNA
>    encoding [RFC3490 <https://www.rfc-editor.org/rfc/rfc3490>] prior to name 
> lookup.
>
> From this we can infer that:
>
> - Host encoding can only happen for UTF-8 sequence but in your example "ex%
> 61mple.com" is used which is not conforming to the rules (ie it should
> throw an InvalidUriException IMHO for the Uri class) I presume for WhatWg
> URL it will get correctly converted with a soft error (??).
>
Oh, that's a very interesting catch again. If your interpretation is
correct, then I think it must also be some bug
with the parser library, but I have to dig into the code first, or reach
out to its author. :)

I have some suspicion though that the "URI producing applications" part may
not apply for this case, at least I have a hard-time
to decide what this expression really means. The RFC also uses "URI
reference parsers" that is really
a straightforward name, while "URI producers" isn't. For example, there is
a paragraph in the RFC:

> URI producers and normalizers should omit the ":" delimiter that
separates host from port if the port component is empty. Some schemes do
not allow the userinfo and/or port subcomponents.

Clearly, omitting ":" is not done during parse-time, but when a URI
(reference) is produced. So I find it possible that
"URI producing" mean when the URI string is created, not when the URI is
parsed, although the RFC usually
uses URI and URI reference consistently. So I'm not sure. Maybe it's a
typo, and it should have been "URI normalizers".

Regards,
Máté

Reply via email to