On Wed, Dec 04, 2002 at 12:49:15PM -0500, Maiorana, Jason wrote:
> As a side-note, I copy/pasted a command line flag from a RH8.0
> manpage back into the console, and tried to execute the command.
> 
> It failed, and gave me usage. The reason, I discovered, is that
> the manpage was not using a regular ascii '-', but instead one
> of the HYPEN, or EM_DASH things (Which is why i HATE them).

I think they're perfectly useful, including in manpages, but I agree
they shouldn't be used in syntax displays.  (Unless the application can
actually handle them; which would, in fact, be neat in a novel way,
though I think that would ultimately be a bad idea.  :)

> Irregardless, I dont think the O/S or filesystem code should
> enforce, require, or even know about normalization forms.
> Instead, a well designed user interface should simply show
> non-normalized, over-coded, or invalid UTF-8 sequences as
> bakemoji, in some standard way (such as big rectangles),
> such that it can still be copy/pasted and worked with, but
> not easily confused with proper stuff. The input method
> would always generate normal utf-8, naturally.

It's not clear who's responsibility this is.  There are quite a few
things that are invalid, and they're not easy to handle at every layer.

For example, suppose you have a filename that begins with a combining
character.  If it's the terminal's job to deal with weird output, it
can't do that here; if you run 'ls', the combining character will just
get attached to the whitespace preceding the filename.  ls has to handle
it.

It's probably the terminal's job only so far as always sending NFC when
the user types (which seems to be the de-facto standard, at least); beyond
that it seems to be the job of tools.  Pasting is a little fuzzier.  What
if I'm in Windows, and some other app I'm using uses NFD (for some, possibly
valid, reason)?  I don't want my terminal pasting text from that app in NFD
(since it'll result in filenames on my system in NFD, for example).

If the shell interface is designed to allow me to do everything in NFC
(eg. by having ls and friends escape anything that's not in NFC, along
with all of the other things it should be escaping), then it shouldn't
be a problem to have terminals normalize output text in NFC.

I think it's important that, in the end, I'm always consistently able to
reference any filename displayed by ls via copy-and-paste; otherwise
I'll have to go to annoying lengths to, for example, delete a file with
a bad filename.

Note that when I'm talking about ls escaping text, I mean that it
should have a new a flag indicating that it's allowed to use \u and \U
escapes and that it should use those--and \x--for escaping UTF-8-related
things; this would combine with whatever --quoting-style is in use, and
might be good to default to being on.  Things that would be useful to
escape are invalid/overlong UTF-8 sequences, using \x; combining characters
at the beginning of filenames, too many combining characters--configurable;
anything of width zero that isn't a combining character (control
characters); and possibly anything that isn't in NFC (all with \u and \U).

(But, of course, none of this should be enforced by the kernel or libc; I
think everyone is in agreement here.)

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to