"Masahiro Sekiguchi" <[EMAIL PROTECTED]> writes: >> The basic attack: Alice runs on host that uses Latin-1 for >> input/output and enters www.�bank.com (where � is 8859-1 0xB5). The >> domain is registered using U+00B5, but Alice's application transcode >> the string using U+03BC. Either Alice can't connect (if the other >> domain doesn't exist) or she ends up talking to someone else (if the >> other domain does exist). > > I agree the case you described is a problem. However, I don't > agree on the point you state is the cause, i.e., I don't think > it is a transcoding problem. > > Please imagine that we are living in a ideal Unicode-only world. > > Assume the bank registers its domain name using Unicode U+00B5, > intending "micro-bank." Alice *may* type a key for U+00B5 is > she is a computer engineer, but she may type U+03BC in if she is > a Greek linguist, because her keyboard (or input mapper) will be > optimized for Greek typing, or because her thinking is biased by > her Greek familiarity (She probably read the name as "mu bank", > being puzzled what it means.) > > Someone might say this is a Unicode problem. Well, partly. For > this particular case, Unicode could have eliminated one of > U+00B5 and U+03BC. However, there are a lot of similar cases: l > and 1, 0 and O, � and ', or � and o, even in the 8859-1 range. > We can't eliminate all of these similar lookings. > > Hence, I consider the basic problem is in our writing systems > and I don't think it's feasible to fix them.
I agree it isn't feasible to fix them, so that problem cannot be solved. That problem should probably be mentioned in the security considerations as well. The user gets what she enters, and if she enters something else than she expects to enter, there will be errors that can have security implications. The same problem exists today, if you enter "mybank.com" instead of "mubank.com" no technical aid can protect you from someone calling herself "mybank.com" setting up a similar looking web site as "mubank.com", including server certs etc. I think the transcoding issue is separate though. The security implications in the scenario above can be solved by having educated users. They must remember the exact spelling and exact characters used to contact their bank (this isn't unreasonable). However, when the system uses transcoding to convert system characters into Unicode characters, even a user entering the "correct" spelling cannot be certain that she ends up at the right server because transcoding algorithms are not specified by IDNA and is left to implementations. The user can even look at the string she entered, and the string found in a certificate and it is possible for them to match, octet-by-octet, with her "correct" string, and still she is talking to the wrong server because different mapping tables exists. When transcoding algorithms are left unspecified, the only way for the user to be able to verify the identity of the bank is to compare the computed IDNA strings with what she wanted. She enters the string using system characters, it is converted into IDNA, the server is contacted and a certificate is fetched. Now, to be certain of the identity, the application will likely compare the IDNA of the server with the one in the certificate, but the user need to compare the computed IDNA with something she knows, to be certain that the application didn't use a transcoding algorithm different from what the bank used, the CA used, and the user intended. If transcoding algorithms was specified by IDNA, the second security problem would be reduced into the first one (modulo any mistakes in transcoding mapping tables -- once the tables are fixed, you can't modify them unless you want to enable the attack again). Instead, it could be easier to just ignore the second security problem, assuming that all implementations will transcode system characters into Unicode characters in the same way. Or that the problem is rare in practice that it doesn't matter. Or that the whole world will switch to Unicode. Or that I misunderstood everything and there isn't a problem at all. Either way, all I'm asking is that the problem and the expected solution is discussed a bit further in the specification, so that I can understand how to implement IDNA securely on my Latin-1 machine. >> Suggested modified security consideration below. It essentially says >> that unless everyone switches to UTF-8, IDNA will enable new attacks >> that has security implications. > > Mentioning the security implications is good. Blaming it on > transcoding is irrelevant. Revilutionalize the world to use > UTF-8 only doesn't completely eliminate the problem, IMHO. As illustrated above, I believe there are two separate problems. It might be good to make both of them explicit.
