On Sun, Aug 10, 2003 at 07:49:05PM +0300, Beni Cherniavsky wrote: > Keld =?iso-8859-1?Q?J=F8rn?= Simonsen wrote on 2003-08-10: > > > Well, you probably do not want to go for Unicode with all its > > normalisation formas etc, but rather for ISO 10646 utf-8. > > > The question is, when I type some letter with an accent, will the > application recieve a precomposed form (assume there is one) or the > base letter followed by a combining character? Granted, unicode-aware > programs shouldn't care - but since we want to work well with > unicode-ignorant software (``cat > file``). > > Given the popularity of NFC as the preferred form for the web, it > probably makes sense that the keymap should emit precomposed > characters when possible. However, what about a different keymap > where you have separate keys for typing letters and then adding > accents? You can handle normalization in cooked mode but you can also > live without it. In any case, raw mode will expose the keymap details > to the app. > > The harder issue is how to handle combining characters on display. > For many people, you can't leave without them. And you want ``cat > file`` to work equally whether the file uses precomposed or decomposed > forms. So you don't have choice - you must normalize on display.
If we stick to ISO 10646 then you need to generate the fully composed characters to get the characters. Of cause these characters are needed, you cannot leve witout them in most languages in the latin script. In Unicode parlance that would mean that you use NFC for the input. For renderring you just output the fully composed characters. You should of cause also output the combining characters but that would mean further processing in the renderring engine. Some scripts do need this further processing in the renderring engine, such as the Indic scripts and Hangul Jamo. Best regards keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
