"Eric A. Hall" <[EMAIL PROTECTED]> wrote: > The i18n namespace is case-sensitive because of the AMC-Z encoding, > not because of nameprep. The original capitalization has to be burned > to suit the encoding. > > As a result, all i18n domain names (unencoded) must be compared as > case-specific data, by requirement of the codec.
You have totally lost me. First, what does "case-specific" mean? I'll start by explaining exactly what "case-sensitive" and "case-insensitive" mean (as I understand the terms). To say that a text string is "case-insensitive" is to say that changing characters from upper to lower case or vice-versa does not change the identity of the string; in other words, case differences are ignored when the string is compared to other strings. To say that a text string is "case-sensitive" is to say that changing characters from upper to lower case or vice-versa does change the identity of the string; in other words, case differences are not ignored when the string is compared. Neither term says anything about whether the string is capable or incapable of preserving mixed-case text; that is an orthogonal question. So there are four possibilities: A string can be case-sensitive and case-preserving (like Unix file names), or case-insensitive and case-preserving (like Macintosh and Amiga file names), or case-insensitive and non-case-preserving (like MS-DOS file names), or case-sensitive and non-case-preserving (I can't think of any real-world examples). For IDNA, we wanted the mapping between non-ASCII labels and ACE labels to have the following property: A case-insensitive comparison of two ACE labels always returns the same answer as a case-insensitive comparison of the two corresponding non-ASCII labels. In order to achieve that property, a case-folding step is essential in the definition of the ACE mapping. (Maybe you don't want that property, but it was a fundamental design goal of IDNA, in order to avoid surprising users with different comparison rules for ASCII versus non-ASCII names.) The codec defined by IDNA has a number of steps, of which the two biggest are Nameprep and Punycode, but it's all a single codec. Punycode is broken out as a separate step because its implementation is independent of the other steps and because Punycode might be useful for things other than domain names. But for IDNs, Punycode is not the codec, it is merely one step of the codec, and Nameprep is another step of the codec. IDNA nowhere suggests the use of case-sensitive comparisons between domain names in any form. > If somebody needs an RR that preserves case, there's no reason they > shouldn't be able to do so. Agreed. And I suggest two ways they might do this: (1) Define a case-sensitive data format and don't call it a domain name. (2) Define a mapping from non-ASCII domain labels to ASCII domain labels that doesn't involve case-folding. But don't call this mapping IDNA, because it's not. It might look very similar to IDNA, and might reuse pieces of it, but it shouldn't use the IDNA ACE prefix for a mapping that is not the IDNA ACE mapping. > Labels are currently stored, transferred and compared as > octet-streams, with the exception being that ASCII A-z is compared as > case-insensitive. That's how most DNS servers handle 8-bit labels in practice, but I still don't think they're required to do so, I just think it's the best effort they can make given that they don't know the charset. But we've heard reports of some new DNS servers that try to guess the charset, in which case they might then do case-insensitive comparisons even for the non-ASCII characters. And there's still the wide world of entities other than DNS servers, which also compare domain names, and their handling of 8-bit names is even less predictable. > you are arguing for mandatory lowercasing for storage and transfer in > addition to comparison. Yes, because that's the only way to achieve that property I mentioned above. > > Now consider an entity that knows that f�o and F�O and xx--fo-fka > > and xx--FO-ohA are domain labels, but does not know that they are > > special labels that don't use Nameprep. > > Why would it ask for a special RR that it doesn't know how to read? Because it might be a caching DNS server. If you only care about the end applications, which know the special semantics of the special labels, then just use a different prefix to go with your different Stringprep profile. Then you can be sure that entities that know IDNA but don't know about your special labels won't accidentally muck with them. > > Nameprep can of course produce identical output for two distinct > > inputs. But for two distinct outputs of Nameprep, ToASCII cannot > > produce the same output. > > Is the encoding form guranteed to always be reversible to the original > capitalization? I'm not sure what you mean by "encoding form". The ACE form (which involves both Nameprep and Punycode) is not guaranteed to be reversible to the original capitalization. There is a mechanism called "mixed-case annotation", described in Appendix B of the Punycode spec, which can remember any capitalization you want (except for titlecase), but the ToASCII and ToUnicode operations described in the IDNA spec neither create nor apply these annotations. It's possible to write a conformant ToASCII that creates them, and a conformant ToUnicode that applies them, but the IDNA spec does not mention this possibility, mainly because the first two authors have never been convinced that it's really safe or effective. :) (I originally proposed this mechanism as a way to make IDNs case-preserving like ASCII domain names are, but I long ago gave up on that issue.) There is no analogous mechanism for recovering a non-normalized string. Punycode by itself can encode any arbitrary sequence of non-negative integers, and always decodes to exactly the same sequence. AMC
