On Thu, Feb 21, 2002 at 01:26:33PM +0900, Gaspar Sinai wrote: > I just browsed through RFC-3010 and I found one thing that > bothers me and it has not been discussed yet (I think). > > RFC says: > > The NFS version 4 protocol does not mandate the use > > of a particular normalization form at this time. > > How do we mount something that contains a precomposed > character like: > > U+00E1 (Composed of U+0061 and U+0301) > > If the U+0061 U+0301 is used and our server is assumimg U+00E1, > can a malicious hacker set up another NFS server that has > U+0061 and U+0301 to mount his NFS volume? I could even > imagine very tricky combinations with Vietnamese text > but that would be another question... > > Forgive my ignorance if this was discuseed - I did not see it > in the archives.
One thing that's bound to be lost in the transition to UTF-8 filenames: the ability to reference any file on the filesystem with a pure CLI. If I see a file with a pi symbol in it, I simply can't type that; I have to copy and paste it or wildcard it. If I have a filename with all Kanji, I can only use wildcards. A normalization form would help a lot, though. It'd guarantee that in all cases where I *do* know how to enter a character in a filename, I can always manipulate the file. (If I see "c�r", I'd be able to "cat c�r" and see it, reliably.) I don't know who would actually normalize filenames, though--a shell can't just normalize all args (not all args are filenames) and doing it in all tools would be unreliable. A mandatory normalization form would also eliminate visibly duplicate filenames. Of course, it can't be enforced, but tools that escape filenames for output could change unnormalized text to \u/\U. I don't quite understand the scenario you're trying to describe, though. -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
