Hello Adam, Sorry for the delay. I'm splitting my answer into two. This one is on the host name vs. domain name question.
At 03:30 02/03/27 +0000, Adam M. Costello wrote: >James Seng/Personal <[EMAIL PROTECTED]> wrote: > > > The discussion of the how URL is to be encoded and how Host: field are > > to be handled is probably more relevant so lets get back to that. Just to make sure that I don't get something wrong: - Domain names are whatever can be used on the lookup side of a dns query. This includes all kinds of current and potential uses besides the core use that people are usually equating with the DNS. - Host names are the names of machines. They are a subset of domain names, used in certain queries/records (e.g. A record). >Okay. Eventually this message will arrive at the following proposal: > > Proposed repertoire for internationalized *host* labels: All > characters in classes L (letter), M (mark), and N (number) are > allowed, and U+002D (hyphen-minus) is also allowed. Everything else > is forbidden. This is a very good first shot. There are some things that have to be carefully checked, e.g. do some M (marks) have to be excluded, or should some signs corresponding to the hyphen-minus be allowed. Two examples I know would be the zero-width space which could be desirable for Farsi, and the (idographic) middle dot, for which several people in Japan have complained that it's not available in XML names. >Which characters should be allowed in internationalized host labels? >This is an interesting question in its own right, and it's possible that >the IESG will demand an answer. >Notice that there is no conflict with Nameprep, because Nameprep does >not prohibit any characters in classes L, M, or N. I guess that if there were a conflict, the host names would just have to satisfy conditions on both sides. >If we were to adopt this definition of internationalized host name, it >would best be understood as an amendment of ToASCII step 3 (which checks >host name restrictions if applicable), tightening substep 3a from: > > (a) Verify the absence of non-LDH ASCII code points; that is, > the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F. > >to: > > (a) Verify that the sequence contains only host code points; > that is, U+002D (hyphen-minus) and code points classified > as L (letter), M (mark), or N (number). See appendix ? for > an enumeration of host code points. > >Or maybe the enumeration would go in Nameprep, or in a separate document >that defines internationalized host names. Looking back on when working on nameprep as a member of the design team, I think the distinction between host names and domain names wasn't clear, at least to me, and probably to several other participants. At some point, I started to worry that having all the symbols allowed might not have been the best choice. Of course, if it's for domain names, then that's a bit different. Regards, Martin.
