Soobok Lee <[EMAIL PROTECTED]> wrote: > UTF-8 forms make subset of the entire set of non-ASCII forms. > Thus, the utf8-compliant subset has been under the overall length > restriction imposed by RFC1035 on the entire set.
UTF-8 data stored directly in 8-bit DNS labels would be subject to the 63-octet limit. This is irrelevant to IDNs, because IDNs do not store UTF-8 data directly in 8-bit DNS labels. IDNA requires that internationalized labels use their 7-bit ASCII form in DNS. If someday you want to use UTF-8 forms of internationalized labels directly in newDNS, you will need to make sure that newDNS allows more than 63 octets per label. Or you could use the UTF-8 form when it fits, and fall back to the ASCII form when UTF-8 doesn't fit. (Or you could decide it's easier to stick with ASCII in the DNS protocol, and create the illusion of UTF-8 using a new resolver on the client.) Your argument seems to be: 1. An internationalized label in UTF-8 form is a sequence of octets. 2. RFC 1035 limits labels to 63 octets. 3. Therefore internationalized labels must have no more than 63 octets in UTF-8 form. But you could try the same argument for UTF-16, and EUC-KR, and iso-2022-jp, and BIG5, etc. Do we conclude that any string that uses more than 63 octets in any encoding cannot be an internationalized label? That would be absurd. Perhaps the key to understanding this is to recognize that 8-bit DNS labels are not internationalized labels. IDNA makes no use of them. Neither IDNA nor DNS defines any textual interpretation for them. They are just opaque binary data (except for the values <= 127, which are ASCII characters). We have no way of deciding whether 8-bit labels are UTF-8 or ISO-8859-1 or EUC-JP, etc. Until the DNS standard is updated to assign some semantics, they are none of the above. IDNA created some brand new kinds of labels that had never existed before: non-ASCII textual labels. They have never appeared in DNS, cannot appear in DNS, and will not be able to appear in DNS unless DNS is updated to support them (because the only text supported by today's DNS is ASCII). These new non-ASCII textual labels are outside the universe of labels defined by RFC 1035, and therefore the RFC 1035 length restriction does not apply to them (not directly, although it applies to their corresponding ASCII forms). AMC
