>Users expect that "�" == "�", and don't know or care about Unicode, and >that's reasonable.
As a side-note, I copy/pasted a command line flag from a RH8.0 manpage back into the console, and tried to execute the command. It failed, and gave me usage. The reason, I discovered, is that the manpage was not using a regular ascii '-', but instead one of the HYPEN, or EM_DASH things (Which is why i HATE them). Manpages which are showing things such as command names or parameters should NOT use glyphs which are not the actual ones which are used by the command. The same way, you do NOT capitalize "grep" even if it is the first word in a sentence. Nor do you punctuate inside of literal strings. (**example EVIL sentence: "Grep" is a command like "sed." **) Major gripe. Irregardless, I dont think the O/S or filesystem code should enforce, require, or even know about normalization forms. Instead, a well designed user interface should simply show non-normalized, over-coded, or invalid UTF-8 sequences as bakemoji, in some standard way (such as big rectangles), such that it can still be copy/pasted and worked with, but not easily confused with proper stuff. The input method would always generate normal utf-8, naturally. Normalizations has its niceties, but it should never be forced or assumed. For example, how many apps that accept UTF8_STRING paste's proceed to filter out invalid utf-8 sequences, normalize, or otherwise post-process the data. This is a bad practice, imo. (It makes sense for some apps, but not for toolkits or general purpose text editors) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
