Yes, that's right. By the way, there may be a different way to address this issue. If libidn has a separate API for NFKC or Nameprep, the caller could pass the entire domain name (including all of the dots and dot-like characters) through NFKC (or Nameprep) first, and then call the normal IDNA routine. This is quite likely to behave the same way as MSIE 7 and Firefox 2. If you chose this approach, you could simply document this somewhere, and callers could then decide whether or not to go this way.
Erik > >> I'm not yet sure whether actually providing a mechanism (like the > >> one I proposed in the patch) to work around the problem is a good thing. > >> The mechanism could just as well cause other problems. > > > > Yes, it is possible that that approach would cause other > > incompatibility problems that I cannot think of at the moment, since > > it is different from MSIE 7 and Firefox 2. > > Indeed. I've thought a bit about this, and there are some problems with > my patch: > > 1) It only treats U+2024 as a dot. There are other code points as well, > but none are as simple as U+2024. The others include: > > 2024;ONE DOT LEADER;Po;0;ON;<compat> 002E;;;;N;;;;; > 2025;TWO DOT LEADER;Po;0;ON;<compat> 002E 002E;;;;N;;;;; > 2026;HORIZONTAL ELLIPSIS;Po;0;ON;<compat> 002E 002E 002E;;;;N;;;;; > 2488;DIGIT ONE FULL STOP;No;0;EN;<compat> 0031 002E;;1;1;N;DIGIT ONE > PERIOD;;;; > 2489;DIGIT TWO FULL STOP;No;0;EN;<compat> 0032 002E;;2;2;N;DIGIT TWO > PERIOD;;;; > ... > 2498;NUMBER SEVENTEEN FULL STOP;No;0;EN;<compat> 0031 0037 002E;;;17;N;NUMBER > SEVENTEEN PERIOD;;;; > ... > 249B;NUMBER TWENTY FULL STOP;No;0;EN;<compat> 0032 0030 002E;;;20;N;NUMBER > TWENTY PERIOD;;;; > 33C2;SQUARE AM;So;0;L;<square> 0061 002E 006D 002E;;;;N;SQUARED AM;;;; > 33C7;SQUARE CO;So;0;L;<square> 0043 006F 002E;;;;N;SQUARED CO;;;; > 33D8;SQUARE PM;So;0;L;<square> 0070 002E 006D 002E;;;;N;SQUARED PM;;;; > FE52;SMALL FULL STOP;Po;0;CS;<small> 002E;;;;N;SMALL PERIOD;;;; > > It would be incorrect to treat all of these as dots as well. For > example: > > ToASCII(hi U+248C com) = hi5.com > > If we extend my patch for U+248C one, libidn would generate 'hi.com' > instead of 'hi5.com'. > > Right now, both Firefox and libidn translates the input into the ASCII > string hi5.com. Arguable Firefox is incorrect (wrt the RFC) in that it > treat the string as two labels rather than one. > > 2) As you say, the patch is different from what MSIE/Firefox really > implements. The only advantage with a new flag in libidn (that I see) > would be if it does exactly the same as MSIE/Firefox. But it doesn't. > > Thus, my patch seems to be the wrong thing, and I'm not going to install > it now. > > If someone wants to work on a patch against libidn that makes it > implement the MSIE/Firefox algorithm, when a new IDNA flag is given, > that would be something we could seriously consider applying. I'm > currently too busy to do this on a pro-bono basis though. > > Thanks, > /Simon > _______________________________________________ Help-libidn mailing list [email protected] http://lists.gnu.org/mailman/listinfo/help-libidn
