On Thu, Feb 21, 2002 at 11:23:20AM +0000, Edmund GRIMLEY EVANS wrote:
> People are advocating normalisation as a solution for various kinds of
> file name confusion, but I can imagine normalisation making things
> worse.
>
> For example, file names with a trailing space can certainly be
> confusing, but would life be any simpler if some programmer decided to
> strip trailing white space at some point in the processing of a file
> name? I don't think so. You would then potentially have files that are
> not just hard to delete, but impossible to delete.
If I have two computers, one sending precomposed and one not, I can't
access my "c�r" file created on one on the other. If terminal emulators,
IMs, etc. send normalized characters, this isn't a problem. (It doesn't
fix all problems, but it would help fix up some of the major ones.)
Then, if a filename is being displayed by ls which doesn't fit the
normalization form expected in filenames, display it in a way that shows
what it really is. ("c\u00E2r".) (Optional, of course.) This is less
useful with the other unavoidable glyph ambiguities, though.
"cat" certainly shouldn't normalize its arguments.
> I'm not even convinced that it's a good idea to force file names to be
> in UTF-8. Perhaps it would be simpler and more robust to let file
> names be any null-terminated string of octets and just recommend that
> people use (some normalisation form of) UTF-8. That way you won't have
> the problem of some files (with ill-formed names) being visible
> locally but not remotely because the server or the client is either
> blocking the names or "normalising" them in some weird and unexpected
> way.
I'm not suggesting NFS normalize anything; this is just as important on
a single system being accessed from multiple terminals.
Sorry, the switch from NFS to filenames in general wasn't clear.
> What's so bad about just being 8-bit clean?
Oh, network protocols *should* be 8-bit clean for filenames (minus nul).
If I have a remote filename with an invalid filename (overlong UTF-8
sequence or just plain garbage), I'd better be able to access it over
NFS. I don't think the FS (NFS, local filesystem, FTP, whatever) should
touch filenames at all. (Mandating that they be UTF-8 in the standard
is a good thing; enforcing it at the FS layer is not.)
Related: I frequently can't touch filenames with non-English characters
over Samba, and filenames with characters Windows bans from filenames.
Windows displays them as some random-looking series of characters, and it
doesn't always map back correctly. This doesn't really have anything to do
with the network protocol--though the actual implementation problem might
be in there--it's that it doesn't deal with "invalid" filename properly.
--
Glenn Maynard
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/