Because e.g. on the ext3 file system, you can have two files with the name "ü" in the same directory. One named using the single character "ü" and one named using as the string "u¨" (both in utf-8). If you make the compiler automatically normalise everything, you lose information (and get the security holes etc).
I see, but as this is not handled decently with good old ANSIStrings, anyway, there is not "friendly old school" way that a compiler would be able to offer. In these special cases, the user of course needs to explicitly handle the upgrade of his project to unicode.

OTOH, in this special case, I don't see why the compiler should "normalize" "u¨" to "ü". If the software is supposed to be handling unicode, the unicode string "u¨" should be considered a perfectly legal two-code-point information consisting of a "u" (a single sub-code in UTF-8) and a double-dot (supposedly two subcodes in UTF-8). If the user wants to handle this as a single "ü", he should write appropriate code for that. Any automation on that is dangerous.

-Michael
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to