On Thu, Feb 21, 2002 at 01:26:33PM +0900, Gaspar Sinai wrote:
> I just browsed through RFC-3010 and I found one thing that
> bothers me and it has not been discussed yet (I think).
> 
> RFC says:
> > The NFS version 4 protocol does not mandate the use
> > of a particular  normalization form at this time.
> 
> How do we mount something that contains a precomposed
> character like:
> 
>   U+00E1 (Composed of U+0061 and U+0301)
> 
> If the U+0061 U+0301 is used and our server is assumimg U+00E1,
> can a malicious hacker set up another NFS server that has
> U+0061 and U+0301 to mount his NFS volume? I could even
> imagine very tricky combinations with Vietnamese text
> but that would be another question...
> 
> Forgive my ignorance if this was discuseed - I did not see it
> in the archives.

One thing that's bound to be lost in the transition to UTF-8 filenames:
the ability to reference any file on the filesystem with a pure CLI.
If I see a file with a pi symbol in it, I simply can't type that; I have
to copy and paste it or wildcard it.  If I have a filename with all
Kanji, I can only use wildcards.

A normalization form would help a lot, though. It'd guarantee that in
all cases where I *do* know how to enter a character in a filename,
I can always manipulate the file.  (If I see "c�r", I'd be able to "cat
c�r" and see it, reliably.)

I don't know who would actually normalize filenames, though--a shell
can't just normalize all args (not all args are filenames) and doing it
in all tools would be unreliable.

A mandatory normalization form would also eliminate visibly duplicate
filenames.  Of course, it can't be enforced, but tools that escape
filenames for output could change unnormalized text to \u/\U.

I don't quite understand the scenario you're trying to describe, though.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to