Re: [fpc-devel] Unicode support (again)

Jonas Maebe Tue, 11 Nov 2008 04:31:36 -0800


On 11 Nov 2008, at 13:15, Michael Schnell wrote:

OTOH, in this special case, I don't see why the compiler should"normalize" "u¨" to "ü". If the software is supposed to be handlingunicode, the unicode string "u¨" should be considered a perfectlylegal two-code-point information consisting of a "u" (a single sub-code in UTF-8) and a double-dot (supposedly two subcodes in UTF-8).

Note that I was simplifying. It's not actually "u¨", but "u" followedby the code point meaning "put ¨ on top of the preceding character".In other words, there is (all in UTF-8)


a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC

b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by"COMBINING DIAERESIS", which is encoded as $CC $88c) "u¨": "LATIN SMALL LETTER U", encoded as $75, followed by"DIAERESIS", which is encoded as $C2 $A8

If the user wants to handle this as a single "ü", he should writeappropriate code for that. Any automation on that is dangerous.

The character combination actually literally means "ü" in both cases.It's not a decision of a user whether or not it means "ü".



Jonas_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode support (again)

Reply via email to