On Wed, Dec 04, 2002 at 12:49:15PM -0500, Maiorana, Jason wrote: > As a side-note, I copy/pasted a command line flag from a RH8.0 > manpage back into the console, and tried to execute the command. > > It failed, and gave me usage. The reason, I discovered, is that > the manpage was not using a regular ascii '-', but instead one > of the HYPEN, or EM_DASH things (Which is why i HATE them).
I think they're perfectly useful, including in manpages, but I agree they shouldn't be used in syntax displays. (Unless the application can actually handle them; which would, in fact, be neat in a novel way, though I think that would ultimately be a bad idea. :) > Irregardless, I dont think the O/S or filesystem code should > enforce, require, or even know about normalization forms. > Instead, a well designed user interface should simply show > non-normalized, over-coded, or invalid UTF-8 sequences as > bakemoji, in some standard way (such as big rectangles), > such that it can still be copy/pasted and worked with, but > not easily confused with proper stuff. The input method > would always generate normal utf-8, naturally. It's not clear who's responsibility this is. There are quite a few things that are invalid, and they're not easy to handle at every layer. For example, suppose you have a filename that begins with a combining character. If it's the terminal's job to deal with weird output, it can't do that here; if you run 'ls', the combining character will just get attached to the whitespace preceding the filename. ls has to handle it. It's probably the terminal's job only so far as always sending NFC when the user types (which seems to be the de-facto standard, at least); beyond that it seems to be the job of tools. Pasting is a little fuzzier. What if I'm in Windows, and some other app I'm using uses NFD (for some, possibly valid, reason)? I don't want my terminal pasting text from that app in NFD (since it'll result in filenames on my system in NFD, for example). If the shell interface is designed to allow me to do everything in NFC (eg. by having ls and friends escape anything that's not in NFC, along with all of the other things it should be escaping), then it shouldn't be a problem to have terminals normalize output text in NFC. I think it's important that, in the end, I'm always consistently able to reference any filename displayed by ls via copy-and-paste; otherwise I'll have to go to annoying lengths to, for example, delete a file with a bad filename. Note that when I'm talking about ls escaping text, I mean that it should have a new a flag indicating that it's allowed to use \u and \U escapes and that it should use those--and \x--for escaping UTF-8-related things; this would combine with whatever --quoting-style is in use, and might be good to default to being on. Things that would be useful to escape are invalid/overlong UTF-8 sequences, using \x; combining characters at the beginning of filenames, too many combining characters--configurable; anything of width zero that isn't a combining character (control characters); and possibly anything that isn't in NFC (all with \u and \U). (But, of course, none of this should be enforced by the kernel or libc; I think everyone is in agreement here.) -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
