Roozbeh Pournader <[EMAIL PROTECTED]> writes: >> The basic attack: Alice runs on host that uses Latin-1 for >> input/output and enters www.�bank.com (where � is 8859-1 0xB5). The >> domain is registered using U+00B5, but Alice's application transcode >> the string using U+03BC. Either Alice can't connect (if the other >> domain doesn't exist) or she ends up talking to someone else (if the >> other domain does exist). > > I'm sorry, but your example doesn't work. In nameprep, when doing Unicode > Normalization, U+00B5 is mapped to U+03BC. So these will be the same > domain name, and have the same ACE label.
You are right. What about other examples? ISO-8859-1 0xB5: U+00B5 / U+03BC: Mapped to U+03BC as you indicate ISO-8859-1 0xC5: U+00C5 / U+212B: Mapped to U+00C5 CP437 0xE1: U+03B2 / U+00DF: ? CP437 0xEA: U+03A9 / U+2126: Mapped to U+03A9 CP437 0xEE: U+03B5 / U+2208: ? JIS-X-0208 0x2140: U+005C / U+FF3C: ? "?" means I could not find any KC normalization in the Unicode tables at http://www.unicode.org/charts/normalization/, I'm not sure how to interprete this. Possibly it means they are not normalized, in which case there is a problem? I agree with Mark Davis that it would be interesting to find out which and how many characters in commonly used legacy charsets that may cause these problems. Also note that, if these tables are ever changed in the future, this could also be exploited. Application A uses mapping table version Y and application B uses mapping table version Y+1 which transcode and/or normalizes characters differently. In this case someone could register either old or new domain and fool either new or old applications.
