"McDonald, Ira" wrote on 2002-02-23 21:45 UTC:
> told us to use NFKC (which folds compatibility equivalents into
> their base characters).

Well, NFKC is a subset of NFC, and it is certainly a more "proper" for
of Unicode. If given as advice to people who enter new Unicode strings,
sticking to NFKC is certainly a good idea, as it eliminates the use of a
number of compatibility characters such as the much hated ANGSTROEM
SIGN.

NFKC is just not suitable for applications that have to deal with
already existing text (e.g., a file system) and have to take and
preserve whatever information they are provided. NFKC also takes away a
number of characters that are perfectly useable for terminal emulator
applications, e.g. the subscript/superscript digits, but which should
not be used in a proper word processing environment where there are
better ways to select such presentation forms.

So it really depends on the exact application. There are good reasons
why different normalization forms exist, even though I am sure there are
purists who will say that NFKD is the only clean and proper form of
Unicode.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to