"Eric A. Hall" <[EMAIL PROTECTED]> wrote: > If you are going to be moving SOME of the prohibited characters from > nameprep to the IDNA hostname processing, then you need to move ALL of > them at that stage.
There are some code points that should be prohibited in all internationalized textual domain names. The private use code points, noncharacter code points, surrogate codes, and left-to-right mark are some of the best examples of such code points. These prohibitions do indeed belong in nameprep. There are some code points that are prohibited in host names, but not in all textual domain names. The underscore is the best example. These prohibitions belong in ToASCII. It's not always clear which side of the line particular code points fall on. The least clear-cut are the ASCII prohibitions: 0..20 and 7F. Feel free to offer some arguments. > At the very least, you should consolidate the prohibited characters > into IDNA, as the prohibited characters which appear to be in nameprep > are in fact valid for STD13 domain names. In the broadest sense a STD13 domain name can contain arbitrary binary data. Nameprep is not intended for domain names in this broadest sense. It is intended for domain names composed of internationalized *text*. It is appropriate for nameprep to prohibit things that are difficult to interpret as text. Therefore, the prohibition of ASCII control characters doesn't worry me much. The prohibition of ASCII space worries me a little. Notice that nameprep doesn't prohibit dots in domain labels, even though dots are usually used to delimit labels, are are therefore tricky to put into labels. In fact RFC 1035 shows how to get dots into labels for the purpose of representing email addresses that contain dots in the local part. If nameprep doesn't prohibit dots, why should it prohibit spaces, which are also allowed in email address local parts? I support the prohibition of all other whitespace characters, because it would be nasty to distinguish between different kinds of whitespace, but I'm not so confident about the prohibition of ASCII space. That one could stand some more scrutiny. AMC
