Keld =?iso-8859-1?Q?J=F8rn?= Simonsen wrote on 2003-08-10:

> If we stick to ISO 10646 then you need to generate the fully
> composed characters to get the characters. Of cause these characters
> are needed, you cannot leave witout them in most languages in the
> latin script.
>
Why?  What does ISO 10646 lack that Unicode has?  I thought they are
pretty much the same...  Doesn't ISO 10646 define combining
characters?

> In Unicode parlance that would mean that you use NFC for the input.
>
> For renderring you just output the fully composed characters.  You
> should of cause also output the combining characters but that would
> mean further processing in the renderring engine. Some scripts do
> need this further processing in the renderring engine, such as the
> Indic scripts and Hangul Jamo.
>
Since NFC is merely recommended but not required, it is very possible
for me to have a UTF-8 text file even in simple european languages,
that has decomposed characters.  If I want to be able to ``cat file``
onto the console, it's absolutely required that the console can handle
normalization.  If it works in one form but not the other, users are
going to *very* surprised.

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to