Paul Hoffman / IMC wrote:
> The difference between host labels which are covered by IDNA and > domain labels which are not must be well-defined before IDNA is > finished. The -04 draft takes a step towards that, but it is not > complete. Suggestions for specific wording for the draft on this > topic are *greatly* appreciated. All domain names are unstructured eight-bit sequences, host names are a specific subset of that range. Host names are the exception, domain names are the rule. Treating domain names as the exception results in the above problem. This isn't a simple block of text... The draft I'm working on punts with the problem cases cited: labels which only contain characters in the range 0x00 through 0x7E must only be encoded as STD13 octet sequences and UTF-8, while domain names that have any eight-bit value in the label are to be encoded as STD13 octet, ACE and UTF-8 equally. If a server is unable to choose between STD13 and ACE output encoding, it favors ACE on the assumption that it is more likely to be Latin-1 than an eight-bit code, and that ACE has future processing characteristics (can be used as CNAME for a host) whereas STD13 octet encoding does not. This is definitely a punt which is guaranteed to fail in more than one scenario. Some sort of group decision needs to be made on this at some point; ambiguous matches in DNS are not cool. Note that UTF-8 does not suffer this ambiguity, since it doesn't overload a shared label: if the query arrived as UTF-8, the canonical UCS character is encoded as UTF-8 and returned for the recipient to decode, so there is no ambiguity as to which encoding should be used. Nor does it matter if the client wanted STD13 binary domain or an internationalized domain name, because there is no difference with this particular encoding scenario; they asked for a code point in a specific encoding and we comply. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
