> The encoded values being passed around are supposed to -decode- as UCS > character codes, so it would seem best to maintain consistency across all > outputs. We do not want to make it more complex than necessary and > defining multiple extraction profiles would make processing much more > complex: "we promise that this encoding will represent a series of UCS > characters" should be it. For that reason, my opinion is that any UCS > character code (including those not yet assigned) should be valid for > internationalized domain names (with nameprep providing the host name > subset filter). Going that route incurs a responsibility to tell > implementations that they have to be careful with data they process, but > in truth we had that responsibility already given the broader exposure of > the combining characters.
Right! As I said, we are not disagreeing :-) But I think it is useful to continue this discussion as I think it is leading somewhere. Definition: domain names - any 8-bit characters, usually (but not neccessary) US-ASCII host names - limited to LDH, no leading or trailing "-", delimited by ".", cannot contain have all digits-only labels What is a definiton of internationalized domain names and internationalized host names? We seem to agree at least that i18n domain names - any unicode characters i18n host names - i18n domain names subjected to prohibited list defined in Section 5 of nameprep (host names limitation, space characters, control characters, private use, etc) but is i18n host name sufficient for normal use? as technical implementation, maybe. for policy implementation, unlikely. perhaps we need a new term for that... > I would submit that we are describing transfer encodings and their > handling. The application media being used to transfer the encoded values > provide seven- and eight-bit paths. For the sake of maximum efficiency > with the applications that transfer and use the domain names, we should > provide seven- and eight-bit encodings. The encapsulation constraints in > DNS are difficult to work with but that does not change the above. We > should not be defining mandatory seven-bit encodings for eight-bit > applications especially if they are compliant with BCP18 for every unit of > protocol and/or application data. We differ in this. The question if we need more than one CCS have been answered long time back. The choice clearly ISO/IEC 10646. TES is an encoding. CES is an encoding. I am asking if we need more than one encoding (either TES or CES). That is my first question. If the answer is that we need more than one encoding, then the next question would be how many separate cases do we have, ie, how many encodings do we need? (I could argue it is more "fair" to use UTF-32 in EDNS labels) Then we can start asking which is the appropriate encoding for each case, ie, your question 2 and 3. -James Seng
