Keld =?iso-8859-1?Q?J=F8rn?= Simonsen wrote on 2003-08-10: > If we stick to ISO 10646 then you need to generate the fully > composed characters to get the characters. Of cause these characters > are needed, you cannot leave witout them in most languages in the > latin script. > Why? What does ISO 10646 lack that Unicode has? I thought they are pretty much the same... Doesn't ISO 10646 define combining characters?
> In Unicode parlance that would mean that you use NFC for the input. > > For renderring you just output the fully composed characters. You > should of cause also output the combining characters but that would > mean further processing in the renderring engine. Some scripts do > need this further processing in the renderring engine, such as the > Indic scripts and Hangul Jamo. > Since NFC is merely recommended but not required, it is very possible for me to have a UTF-8 text file even in simple european languages, that has decomposed characters. If I want to be able to ``cat file`` onto the console, it's absolutely required that the console can handle normalization. If it works in one form but not the other, users are going to *very* surprised. -- Beni Cherniavsky <[EMAIL PROTECTED]> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
