>Users expect that "�" == "�", and don't know or care about Unicode, and
>that's reasonable.

As a side-note, I copy/pasted a command line flag from a RH8.0
manpage back into the console, and tried to execute the command.

It failed, and gave me usage. The reason, I discovered, is that
the manpage was not using a regular ascii '-', but instead one
of the HYPEN, or EM_DASH things (Which is why i HATE them).

Manpages which are showing things such as command names or
parameters should NOT use glyphs which are not the actual
ones which are used by the command. The same way, you do NOT
capitalize "grep" even if it is the first word in a sentence.
Nor do you punctuate inside of literal strings. 
(**example EVIL sentence: "Grep" is a command like "sed." **)
Major gripe.

Irregardless, I dont think the O/S or filesystem code should
enforce, require, or even know about normalization forms.
Instead, a well designed user interface should simply show
non-normalized, over-coded, or invalid UTF-8 sequences as
bakemoji, in some standard way (such as big rectangles),
such that it can still be copy/pasted and worked with, but
not easily confused with proper stuff. The input method
would always generate normal utf-8, naturally.

Normalizations has its niceties, but it should never be
forced or assumed. For example, how many apps that accept
UTF8_STRING paste's proceed to filter out invalid utf-8
sequences, normalize, or otherwise post-process the data.
This is a bad practice, imo. (It makes sense for some
apps, but not for toolkits or general purpose text editors)



--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to