Re: Linux and UTF8 filenames

Glenn Maynard Fri, 20 Sep 2002 13:22:35 -0700

On Fri, Sep 20, 2002 at 02:29:06PM +0200, Bruno Haible wrote:
> This is a non-issue. All locale encodings used on Linux, from
> ISO-8859-* over BIG5 to GB18030, use the bytes 0x2f and 0x00 only
> for '/' and '\0' respectively.
> 
> The '/' is a problem with ISO-2022 based encodings, but noone
> with
> a brain in his head uses them as locale encodings.


Those characters aren't the problem.  (I said that it needs to be
8-bit-clean *except* for those characters--filenames don't need to be
able to store them.)

If the filesystem is ISO-8859-1, and it's mounted to look UTF-8:
ignoring the issue of characters that don't fit in ISO-8859-1 (major in
itself), you still have to honor the rule that "if strcmp() says two
filenames are different, then they are two different filenames", and
many UTF-8 strings can convert to the same ISO-8859-1 filename (combining
characters, etc).  Never mind the question of what happens when you
throw an invalid UTF-8 sequence at it ...

The same problems happen in reverse (UTF-8 filenames on the FS,
converted to ISO-8859-1).

Of course, some of the rules are ignored by filesystems using these
conversions (eg. FAT), but ext2 can't do that.

(I don't see why it's useful, anyway; if you want to categorically
recode all of your filenames, a quick script will do it.  This doesn't
need FS support.)

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Linux and UTF8 filenames

Reply via email to