"Erik van der Poel" <[EMAIL PROTECTED]> writes: > Looks good to me. > > Other than your interpretation of RFC 3490 leading to the insertion of > 0x2E into a DNS label, but I guess you and I will simply have to agree > that we disagree on this point. RFC 3490 should have been clearer.
I regard escaping 0x2E as the logical consequence of the IDNA design to operate on single labels and how U+2024 etc behaves under NFKC. I think RFC 3490 didn't intend for ToASCII to be able to take one label and output two labels. I suspect the reason for the problems here is that there was a perception that ToASCII would never produce new 0x2E's. But I can't say for sure. > By the way, I did a Web search for "2024 nfkc" and found that this > issue was raised, but I guess it was not resolved adequately: > > http://www.ops.ietf.org/lists/idn/idn.2001/msg02450.html Interesting. /Simon > Erik > > On Jan 15, 2008 7:15 AM, Simon Josefsson <[EMAIL PROTECTED]> wrote: >> "Erik van der Poel" <[EMAIL PROTECTED]> writes: >> >> > Yes, that's right. >> > >> > By the way, there may be a different way to address this issue. If >> > libidn has a separate API for NFKC or Nameprep, the caller could pass >> > the entire domain name (including all of the dots and dot-like >> > characters) through NFKC (or Nameprep) first, and then call the normal >> > IDNA routine. This is quite likely to behave the same way as MSIE 7 >> > and Firefox 2. If you chose this approach, you could simply document >> > this somewhere, and callers could then decide whether or not to go >> > this way. >> >> Libidn has a simple NFKC interface, and I'm documenting that approach >> now. Below is the current text in the manual. I'll forward this to the >> Firefox IDN guys to see if they are interested in documenting their >> practice further, possibly in an I-D. If ToASCII(NFKC(i)) turns out to >> actually work and behave better than RFC 3490, documenting that now >> seems useful. >> >> Thanks, >> /Simon >> >> Appendix B On Label Separators >> ****************************** >> >> Some strings contains characters whose NFKC normalized form contain the >> ASCII dot (0x2E, "."). Examples of these characters are U+2024 (ONE >> DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the >> interesting property that their IDNA ToASCII output will contain >> embedded dots. For example: >> >> ToASCII (hi U+248C com) = hi5.com >> ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u >> >> This demonstrate the two general cases: The first where the ASCII dot >> is part of an output that do not begin with the IDN prefix "xn-". The >> second example illustrate when the dot is part of IDN prefixed with >> "xn-". >> >> The input strings are, from the DNS point of view, a single label. >> The IDNA algorithm translate one label at a time. Thus, the output is >> expected to be only one label. What is important here is to make sure >> the DNS resolver receives the correct query. The DNS protocol does not >> use the dot to delimit labels on the wire, rather it uses length-value >> pairs. Thus the correct query would be for `{7}hi5.com' and >> `{22}xn--rksmrgs.com-l8as9u' respectively. >> >> Some implementations (1) have decided that these inputs strings are >> potentially confusing for the user. The string "hi U+248C com" looks >> like "hi5.com" on systems that support Unicode properly. These >> implementations do not follow RFC 3490. They yield: >> >> ToASCII (hi U+248C com) = hi5.com >> ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com >> >> The DNS query they perform are `{3}hi5{3}com' and >> `{18}xn--rksmrgs-5wao1o{3}com' respectively. Arguably, this leads to a >> better user experience, and suggests that the IDNA specification is >> sub-optimal in this area. >> >> B.1 Recommended Workaround >> ========================== >> >> It has been suggested to normalize the entire input string using NFKC >> before passing it to IDNA ToASCII. You may use >> `stringprep_utf8_nfkc_normalize' or `stringprep_ucs4_nfkc_normalize'. >> This will avoid the problem, and appears to lead to similar behaviour >> as IE/Firefox. >> >> Alternative workarounds are being considered. Eventually Libidn may >> implement a new flag to the `idna_*' functions that implements a >> recommended way to work around this problem. >> >> ---------- Footnotes ---------- >> >> (1) Notably Microsoft's Internet Explorer and Mozilla's Firefox, but >> not Apple's Safari. >> _______________________________________________ Help-libidn mailing list [email protected] http://lists.gnu.org/mailman/listinfo/help-libidn
