On 11 Nov 2008, at 13:15, Michael Schnell wrote:
OTOH, in this special case, I don't see why the compiler should
"normalize" "u¨" to "ü". If the software is supposed to be handling
unicode, the unicode string "u¨" should be considered a perfectly
legal two-code-point information consisting of a "u" (a single sub-
code in UTF-8) and a double-dot (supposedly two subcodes in UTF-8).
Note that I was simplifying. It's not actually "u¨", but "u" followed
by the code point meaning "put ¨ on top of the preceding character".
In other words, there is (all in UTF-8)
a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by
"COMBINING DIAERESIS", which is encoded as $CC $88
c) "u¨": "LATIN SMALL LETTER U", encoded as $75, followed by
"DIAERESIS", which is encoded as $C2 $A8
If the user wants to handle this as a single "ü", he should write
appropriate code for that. Any automation on that is dangerous.
The character combination actually literally means "ü" in both cases.
It's not a decision of a user whether or not it means "ü".
Jonas _______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel