Kaixo!

On Wed, Feb 20, 2002 at 03:28:56PM +0100, Oyvind A. Holm wrote:
> On 2002-02-20 00:12 Markus Kuhn wrote:
> 
> > I just spottet in section 1.1.3 of RFC 3030 (NFS version 4 Protocol)
> > the following requirement: "file and directory names are encoded with
> > UTF-8".
> 
> That?s incredibly good news. At last there is any development in
> the mess of 8-bit file names. This is a topic which should be adressed
> actively, the question is on what level translation should be done.

At the same level as other fs which use internally unicode.
the same way you do:

mount /foo /bar -t vfat -o iocharset=koi8-r   (if you are using koi8-r)
mount /foo /bar -t vfat -o iocharset=utf8     (if you have switched to utf-8)

you will do:

mount /foo /bar -t nfs -o iocharset=koi8-r   or
mount /foo /bar -t nfs -o iocharset=utf8
 

(at some time the default should we changed from current iso8859-1 to utf8
when that option isn't specified)

> Haven?t follwed this dicussion lately, but how is the future of
> 8-bit filenames under Linux? At the moment I have to enter UTF-8
> sequences manually when I want 8-bit filenames.

of course if you are using a non-utf-8 locale you cannot type directly
utf-8 chars. Just switch to an utf-8 locale and you will be
able to type utf-8 file names without problem.

> trouble. Especially when moving a directory tree over to cfs (cryptfs)
> on Debian I had some files with Norwegian-specific characters (������)
> in it. The files got moved to the encrypted file system, but I?m
> not able to read them because of Debian bug #44516. Well.

That is a completly different thing: the cryptfs is not transparent.
I suppose the same problem will arrive if you used iso8859-1 names
instead of utf-8.
 
> So: Do you think it?s long time until we can enter 8-bits
> characters in programs when saving files,

You can already do. It is about 2 years you can at least.

> and it will be stored as UTF-8 in the file system?

For ext2 and almos all "traditional" unix fs they don't care about
encodings, it is just bytes. so if you type in utf-8 they are stored
in utf-8.

The problem is when you are not using utf-8 then decide to switch;
or when you export your directories elsewhere... you just have no idea
at all what charset a file name has been created with; you can just hope
it will work with the one you are using.

Some other fs (joliet extensions to iso9660, vfat, ntfs,...) have some
info about the charset used in the filesystem (either by telling it somewhere,
or by making mandatory the used of a given charset).
So, with those fs you *know* what letters are part of their name (they are
not just "bytes", they are known letters) and can, if needed, convert to
display them in your encoding.

> Or is this a unwanted feature? Thinking about
> the ability to read file names with invalid UTF-8 sequences.

It should be possible of course (otherwise you wouldn't be abel to delete
them).
Now, you won't be able to *type* those names when in an utf-8 locale.
And they won't be displayed either.

But it is not very different to the use of control codes in file names.

-- 
Ki �a vos v�ye b�n,
Pablo Saratxaga

http://www.srtxg.easynet.be/            PGP Key available, key ID: 0x8F0E4975

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to