Erik Nordmark <[EMAIL PROTECTED]> wrote: > > an internationalized label can represent at most 63 code points, > > whether it's ACE or not. A given encoding uses a bounded number of > > octets per code point, so you can allocate your buffers based on > > that. > > 63 code points is presumably a conservative number. Given the 4 octet > ACE prefix you can only fit a 59 octets worth of punycode output > per label, hence presumably 59 code points is a tighter limit for > non-ASCII internationalized labels while 63 code points is the limit > for ASCII labels.
True, but which limit you care about depends on the encoding. For example, if you're using UTF-32, then a regular ASCII label can have 63 code points each occupying 4 octets. Soobok Lee <[EMAIL PROTECTED]> wrote: > IDNA section 6.1 goes further than that by allowing _protocols_ to use > non-ACE labels which are not presentation forms nor textual labels, > but protocol elements. What if future ESMTP allows utf8 encodings in > RCPT: headers ? Then applications that implement future ESMTP will need to be prepared for UTF-8 labels to contain more than 63 octets. This is not a problem, because any application that can even think about using non-ASCII labels is aware of IDNA, and therefore knows the definition of internationalized label, and therefore knows that the maximum possible label length depends on the encoding used. Soobok Lee <[EMAIL PROTECTED]> wrote: > They will find an utf8 label may have 168 octets, contrary to RFC1035. There is no contradiction. RFC 1035 says nothing about UTF-8 labels. The RFC 1035 limit of 63 octets per label applies to the universe of labels that RFC 1035 defined. IDNA defines some new labels outside that universe (each of which is equivalent to a label inside that universe, for backward compatibility). If you want to know the maximum possible length of these new labels that were created by IDNA, don't bother looking at RFC 1035, because it can't possibly tell you, because it doesn't even know about the new labels. Look at IDNA, which contains the complete definition of internationalized label. > When IDNA draft granted utf8 label use in application protocols, > it is natural that it should have also specified utf8 label length > restrictions. It did, by defining internationalized label as anything that ToASCII can be applied to without failing. From this you can easily conclude that internationalized labels, when encoded in UTF-8, can exceed 63 octets, but cannot exceed 63*4 octets. A tight upper bound is trickier to figure out, but you don't need it in practice. > So, 1024 or 768 bytes are good. But those utf8 FQDN cannot be put > into single UDP packet of DNS response/query. This will constrain > future DNS protocol update efforts around utf8 supports in wire > format. Today's long iDNs may be one of the obstacles in the way to > the effort. That will indeed be an issue that any UTF-8 DNS protocol update will need to address. AMC
