Paul Hoffman / IMC <[EMAIL PROTECTED]> wrote: > In draft -03, it says: > > For domain names containing non-ASCII characters, the Nameprep > specification ([Nameprep]) defines some mappings, which mainly > include normalization to NFKC and folding to lower case. When > encoding an internationalized domain name in an URI, these > mappings SHOULD NOT be applied. It should be assumed that the > domain name is already normalized as far as appropriate. > > Why the "SHOULD NOT"?
Indeed, I also see no need for that recommendation. > An alternate wording for the last two sentences would be: > > When encoding an internationalized domain name in an URI, these > mappings do not need to be applied if the domain name is already > normalized as far as appropriate. Am I supposed to perform some test to determine whether the domain name is already normalized as far as appropriate? No. I find that clause confusing and unnecessary. I think the only point of this paragraph is that domain names in URIs are not necessarily Nameprepped. That's all it needs to say. Draft -03 says: > For domain names containing non-ASCII characters, the legal > domain names are those for which the ToASCII operation ([IDNA], > [Nameprep]; using the unescaped UTF-8 values as input), with the flags > "UseSTD3ASCIIRules" and "AllowUnassigned" set, is successful. The > URI resolver MUST apply any steps required as part of domain name > resolution by [IDNA], in particular the ToASCII operation, with the > above-mentioned flags set. URI resolvers should indeed set AllowUnassigned, but URI resolvers aren't the only things that use URIs. Consider a program that creates HTML documents. The domain names in those URIs are stored strings, and Stringprep requires that unassigned code points be prohibited in stored strings, so AllowUnassigned would have to be unset in that situation. Finally, I'd like to point out an important caveat: Even if the URI generic syntax is updated to allow non-ASCII characters (escaped) in the host field, that doesn't mean you can actually put non-ASCII domain names into any URI you please. The IDNA rules still apply. If you know that a URI is occupying an IDN-aware slot (for example, if it appears in a new version of HTML that refers to the IDNA spec, or if it appears in an old-HTML document but new HTTP features are used to negotiate IDN-awareness), then you're free to put non-ASCII domain names (escaped) into the URI. But otherwise IDNA section 3.1 requirement 2 applies, and the domain name must contain only ASCII characters. The rationale is to prevent non-ASCII domain names from falling into the unwitting hands of old software that will choke on them. It might be prudent for the idn-uri document to remind the reader of this. AMC
