Dan Oscarsson <[EMAIL PROTECTED]> wrote: > - The count of characters that can fit into 63 octets differ when > using ACE-names and native UCS-names.
True. As an extreme example, consider a label consisting of many repetitions of the same character outside plane 0. UTF-8, UTF-16, and UTF-32 all use 4 octets per character, while Punycode uses about 1. As an extreme example the other way, consider a label consisting of random characters from plane 0. UTF-16 uses 2 octets per character, while Punycode uses about 3.5. > To make things easier for the future, IDNA should require that the IDN > in the ToUnicode form must not be longer than 63 octets. ToUnicode does not output octets, it outputs code points. Which encoding form did you have in mind, UTF-8, UTF-16, or UTF-32? UTF-32 is always at least as large as UTF-16, sometimes larger, so I'll assume you don't want that one. If you go with UTF-16, then all existing ASCII labels over 31 characters become retroactively invalid, which seems very bad. If you go with UTF-8, then Indian scripts can fit only 21 characters per label, versus about 40 for ACE. It's seems a shame to halve the limit for a billion users. I'd really rather not. AMC
