On Thu, Feb 21, 2002 at 11:23:20AM +0000, Edmund GRIMLEY EVANS wrote:
> People are advocating normalisation as a solution for various kinds of
> file name confusion, but I can imagine normalisation making things
> worse.
> 
> For example, file names with a trailing space can certainly be
> confusing, but would life be any simpler if some programmer decided to
> strip trailing white space at some point in the processing of a file
> name? I don't think so. You would then potentially have files that are
> not just hard to delete, but impossible to delete.

If I have two computers, one sending precomposed and one not, I can't
access my "c�r" file created on one on the other.  If terminal emulators,
IMs, etc. send normalized characters, this isn't a problem.  (It doesn't
fix all problems, but it would help fix up some of the major ones.)

Then, if a filename is being displayed by ls which doesn't fit the
normalization form expected in filenames, display it in a way that shows
what it really is.  ("c\u00E2r".)  (Optional, of course.)  This is less
useful with the other unavoidable glyph ambiguities, though.

"cat" certainly shouldn't normalize its arguments.

> I'm not even convinced that it's a good idea to force file names to be
> in UTF-8. Perhaps it would be simpler and more robust to let file
> names be any null-terminated string of octets and just recommend that
> people use (some normalisation form of) UTF-8. That way you won't have
> the problem of some files (with ill-formed names) being visible
> locally but not remotely because the server or the client is either
> blocking the names or "normalising" them in some weird and unexpected
> way.

I'm not suggesting NFS normalize anything; this is just as important on
a single system being accessed from multiple terminals.

Sorry, the switch from NFS to filenames in general wasn't clear.

> What's so bad about just being 8-bit clean?

Oh, network protocols *should* be 8-bit clean for filenames (minus nul).
If I have a remote filename with an invalid filename (overlong UTF-8
sequence or just plain garbage), I'd better be able to access it over
NFS.  I don't think the FS (NFS, local filesystem, FTP, whatever) should
touch filenames at all.  (Mandating that they be UTF-8 in the standard
is a good thing; enforcing it at the FS layer is not.)

Related: I frequently can't touch filenames with non-English characters
over Samba, and filenames with characters Windows bans from filenames.
Windows displays them as some random-looking series of characters, and it
doesn't always map back correctly.  This doesn't really have anything to do
with the network protocol--though the actual implementation problem might
be in there--it's that it doesn't deal with "invalid" filename properly.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to