On Fri, Sep 20, 2002 at 02:29:06PM +0200, Bruno Haible wrote: > This is a non-issue. All locale encodings used on Linux, from > ISO-8859-* over BIG5 to GB18030, use the bytes 0x2f and 0x00 only > for '/' and '\0' respectively. > > The '/' is a problem with ISO-2022 based encodings, but noone > with > a brain in his head uses them as locale encodings.
Those characters aren't the problem. (I said that it needs to be 8-bit-clean *except* for those characters--filenames don't need to be able to store them.) If the filesystem is ISO-8859-1, and it's mounted to look UTF-8: ignoring the issue of characters that don't fit in ISO-8859-1 (major in itself), you still have to honor the rule that "if strcmp() says two filenames are different, then they are two different filenames", and many UTF-8 strings can convert to the same ISO-8859-1 filename (combining characters, etc). Never mind the question of what happens when you throw an invalid UTF-8 sequence at it ... The same problems happen in reverse (UTF-8 filenames on the FS, converted to ISO-8859-1). Of course, some of the rules are ignored by filesystems using these conversions (eg. FAT), but ext2 can't do that. (I don't see why it's useful, anyway; if you want to categorically recode all of your filenames, a quick script will do it. This doesn't need FS support.) -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
