"Adam M. Costello" wrote:
> Are you saying that ToASCII is good, and ToUnicode is bad? I am saying that legacy applications need access to IDN namespaces, but that modifying well-known and widely-used data-types in order to render domain names in Unicode form is foolish. We have to separate domain names from the data-types that also use them; they do not need to be cojoined. This means that legacy applications, protocols and data-types which use STD13 names must only be presented with the STD13 form of the IDNs. The i18n form of those names must only be presented to the applications, protocols and data-types which can make use of an i18n domain name. Incorporating this distinction into the current concepts isn't all that easy because of the cross-breeding of ideas and objectives in the docs. What I would like to see is for the current IDNA spec to be made into a codec definition with guidance on implementation. This means deleting (or re-scoping) section "3. Requirements" and deleting section "6.4 Avoiding exposing users to the raw ACE encoding", and adding a new section for "Implementation considerations". The new text should essentially state that domain names which are used by applications, protocol messages and data-formats MUST be passed and displayed in LDH form, except where the governing specification has explicitly defined an IDN behavior for the affected domain name, and that the use of ToUnicode is expressly prohibited if the governing specification has not defined how and where that function will be deployed. It should also be stated that the "governing specification" will often be the local software specifications, such as man named.conf, man ping, or whatever (these will govern domain names which are used as connection identifiers, and which are not used for protocol messages or standardized data-types). > I can imagine a world with ToASCII but without ToUnicode. If a > non-ASCII name came to you via new protocols that support non-ASCII > names directly, then you'd see the human-friendly form. But if the name > traversed an old protocol (at any point), you'd see the ACE. That would be true for new protocol messages and/or data-types that had to traverse the old namespace, yes. There would also be an assumption that the new messages and/or data-types provided mechanisms for storing the IDNs in some kind of raw form (eg UTF-8), and that a conversion point was defined which said "do the conversion here". > I don't see how doing away with ToUnicode would solve any problem. The > main danger is non-ASCII names getting accidentally fed to old software. > Even without ToUnicode, non-ASCII names would still be out there (being > carried around by the new protocols and new applications), and it would > still be possible for them to get pasted or piped into old programs. If the sanctity of the existing data-types are preserved, that won't break anything (or as much, anyway), since only the new data-types will be able to use native IDNs. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
