on 6/9/2002 3:51 PM Adam M. Costello said the following:
> To say that a text string is > "case-sensitive" is to say that changing characters from upper to lower > case or vice-versa does change the identity of the string; in other > words, case differences are not ignored when the string is compared. That's right. And since the encoded representation of the original string has to be case-neutral for it to work in the legacy label, the input label has to be made case-specific. The input label cannot be changed. This is regardless of whether or not it is converted to lowercase before it is encoded; the label which actually gets encoded is permanently made case-specific, at least in its unencoded form. > For IDNA, we wanted the mapping between non-ASCII labels and ACE labels > to have the following property: A case-insensitive comparison of > two ACE labels always returns the same answer as a case-insensitive > comparison of the two corresponding non-ASCII labels. In order > to achieve that property, a case-folding step is essential in the > definition of the ACE mapping. (Maybe you don't want that property, but > it was a fundamental design goal of IDNA, in order to avoid surprising > users with different comparison rules for ASCII versus non-ASCII names.) We agree on history and partially agree on objectives. Let me explain where we differ. Specifically, I agree with the requirement to make *some domain names* lowercase in order to facilitate simple comparisons. In particular, any domain name which commonly represents a connection identifier (a hostname) should conform to this requirement. Examples of this would include the owner name of an A RR, the RRdata of MX and NS RRs, and most of the other domain names which are commonly used for connection identifiers. However, there are other domain names where this is not required. For example, consider that Apple might want to encode the NBP name of AFP servers within a zone as some kind of iNBP RR which is linked to an atalk zone name. There is absolutely no reason that the iNBP RR owner name or any of the RRdata elements must be mangled. Apple could choose to do so, but there is no reason that we should require it. Similar arguments can be made for NetBIOS names, NetWare SAP entries, NIS domains, and so forth. There are plenty of other examples where this kind of facility needs to be supported, but those are obvious candidates most of us are familiar with. What we need in order to support those kinds of applications is to separate nameprep from IDNA. Specifically, IDNA needs to apply encoding against any label which suits the requirements (UCS character code inputs which result in valid STD13 output), regardless of the stringprep profile in use. Then the applications which create and parse the domain names are the only ones that need to understand the stringprep profile in use for that specific domain name. The ridiculous part here is that under the existing STD13 rules, these RRs can be used simply by defining an interpretation to the octets. For example, Microsoft already provides a direct encoding of NetBIOS names into UTF-8 and simply applies their own interpration to the RRs. Under your rules, they couldn't use i18n domain names as effectively as STD13 domain names since they would have to sacrifice capitalization in the process. Get it? i18n domain names need to have *at least* the same flexibility as STD13 if they are to be adopted by the community. If not, then interoperability will be harmed, because people will continue using STD13 labels and doing their own thing. I think you are missing a key concept here, which is that all of the RRs are going to need to i18n definitions, and IDNA alone won't do it. When the RR rules are defined, they will get stringprep profiles assigned to them. At that point, the applications which create and interpret the unencoded labels are the only ones that need to know anything, and the infrastructure can store, transfer, compare and convert the opaque octets. DM-IDNS-00 also tried to define a global namespace. It doesn't work. The only way out is to use per-RR rules. >>If somebody needs an RR that preserves case, there's no reason they >>shouldn't be able to do so. > > Agreed. And I suggest two ways they might do this: (1) Define a > case-sensitive data format and don't call it a domain name. (2) Define > a mapping from non-ASCII domain labels to ASCII domain labels that > doesn't involve case-folding. But don't call this mapping IDNA, because > it's not. It might look very similar to IDNA, and might reuse pieces of > it, but it shouldn't use the IDNA ACE prefix for a mapping that is not > the IDNA ACE mapping. (1) Anything which fits in the i18n namespace is a domain name, regardless of whether or not IDNA can handle it. The problem is that IDNA is trying to mandate the namespace according to the requirements of nameprep, when there is no technological argument for doing so. (2) Defining alternative type-specific codecs hinders deployment, and is unnecessary. Thus the rule is not only arbitrary, but it also hinders interoperability. > heard reports of some new DNS servers that try to guess the charset, > in which case they might then do case-insensitive comparisons even for > the non-ASCII characters. And there's still the wide world of entities > other than DNS servers, which also compare domain names, and their > handling of 8-bit names is even less predictable. Non-argument. The EDNS label is there to prevent this confusion. What the owner of a zone does with the STD13 octets is up to them. >>>Now consider an entity that knows that f�o and F�O and xx--fo-fka >>>and xx--FO-ohA are domain labels, but does not know that they are >>>special labels that don't use Nameprep. >> >>Why would it ask for a special RR that it doesn't know how to read? > > Because it might be a caching DNS server. Caches and replication servers have no need to understand the capitalization or normalization rules in use with a particular domain name. Only the nodes that create and interpret the domain names need to know anything about the contents. Caches and replication servers only need to understand the layout of the message. > If you only care about the end applications, which know the special > semantics of the special labels, then just use a different prefix to > go with your different Stringprep profile. Then you can be sure that > entities that know IDNA but don't know about your special labels won't > accidentally muck with them. No, that won't work. That requires the resolvers, caches and replication servers to understand special rules about the domain names before the application can be deployed. Essentially, this requires the infrastructure to be upgraded for every new domain name which gets defined. There is absolutely no reason for this. It's a ridiculous artifact of an arbitrary design. Decouple IDNA from nameprep and the problem is solved. > I'm not sure what you mean by "encoding form". The ACE form (which > involves both Nameprep and Punycode) is not guaranteed to be reversible > to the original capitalization. Okay, that's a problem. May have to use something else entirely. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
