Kaixo!

On Thu, Sep 19, 2002 at 02:28:13PM +0200, Keld J�rn Simonsen wrote:
> I know that for DOS/windows file systems you can set a charset 
> for the global system. Maybe yo should also be able to
> do that for the native linux systems on a filesystem wide
> basis. 

No, you cannot.
For DOS/win3 you can do it because they are mono-user.

Note also that DOS/win3 *don't* set any charset in the file system, they
jsut use one, without any special information (exactly like Linux does
in fact; but Linux being multi-user it may happen that two users use
different encodings...).

Win95 and up *do* set a charset for the filesystem, always the same (it is
utf-16 I think); knowing it is always the same makes it possible to
convert it, if needed.

It would be possible to do a similar thing for Linux too, using always
utf-8 as the low level encoding, then modifying the system libraries
so that the displayed name will be locale dependent.
However, while technically possible, it will break POSIX, so it is not
a socially acceptable solution.
 
> I understand that there is no charset or locale attribute
> per se in linux/uinx/posix filesystems and APIs.

There isn't in other fs (afaik); but some mandate a given encoding,
and all tools and libraries handling with it are written with that
assumption in mind.
 
> An the kerne does not know which charset the directory
> entries are in. Or does the kernel know that for 
> the dos/win fs? 

No, you have to told it (the "codepage" mount option).
For vfat (win95 and up) and for joliet CDs, it does know, as those fs
only accept one and only one encoding.

> If so you could also have kernel
> knowledge of native fs charset. And wrapper APIs from
> userland could convert to the fs charset from a locale/
> charset of the running program before calling the actual 
> kernel API. 

See above.
It is technically possible.
But it won't be accepted socially.

> I think we may be heading for a mess if we dont do something 
> like that.

There *will be* a mess, indeed.

What will more likely happen, is that Linux distributions will include
a special conversion tool so that when the switch to utf-8 is done,
the filenames can be converted (you tell the tool your old encoding).
Good tools will be able to work on directories and files instead of whole
partitions, allowing for different possible legacy encodings for differents
directories under /home
Good tolls will also be able to not convert a filename that already is
a valid utf-8 sequence.

However, it will be even more complex than that: some file names may be
referenced in config files and such, and converting the file name may break
the programs...

There will be a mess, there is no way to avoid it; we can only try to make
it the least painfull possible.
(and one thing in that direction is making anything new (config files for
a new program, new protocols, etc etc) use utf-8 and only utf-8)

-- 
Ki �a vos v�ye b�n,
Pablo Saratxaga

http://chanae.stben.be/pablo/           PGP Key available, key ID: 0xD9B85466
[you can write me in Walloon, Spanish, French, English, Italian or Portuguese]

Attachment: msg03165/pgp00000.pgp
Description: PGP signature

Reply via email to