"McDonald, Ira" wrote on 2002-02-23 21:45 UTC: > told us to use NFKC (which folds compatibility equivalents into > their base characters).
Well, NFKC is a subset of NFC, and it is certainly a more "proper" for of Unicode. If given as advice to people who enter new Unicode strings, sticking to NFKC is certainly a good idea, as it eliminates the use of a number of compatibility characters such as the much hated ANGSTROEM SIGN. NFKC is just not suitable for applications that have to deal with already existing text (e.g., a file system) and have to take and preserve whatever information they are provided. NFKC also takes away a number of characters that are perfectly useable for terminal emulator applications, e.g. the subscript/superscript digits, but which should not be used in a proper word processing environment where there are better ways to select such presentation forms. So it really depends on the exact application. There are good reasons why different normalization forms exist, even though I am sure there are purists who will say that NFKD is the only clean and proper form of Unicode. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
