Kaixo! On Thu, Sep 19, 2002 at 02:28:13PM +0200, Keld J�rn Simonsen wrote: > I know that for DOS/windows file systems you can set a charset > for the global system. Maybe yo should also be able to > do that for the native linux systems on a filesystem wide > basis.
No, you cannot. For DOS/win3 you can do it because they are mono-user. Note also that DOS/win3 *don't* set any charset in the file system, they jsut use one, without any special information (exactly like Linux does in fact; but Linux being multi-user it may happen that two users use different encodings...). Win95 and up *do* set a charset for the filesystem, always the same (it is utf-16 I think); knowing it is always the same makes it possible to convert it, if needed. It would be possible to do a similar thing for Linux too, using always utf-8 as the low level encoding, then modifying the system libraries so that the displayed name will be locale dependent. However, while technically possible, it will break POSIX, so it is not a socially acceptable solution. > I understand that there is no charset or locale attribute > per se in linux/uinx/posix filesystems and APIs. There isn't in other fs (afaik); but some mandate a given encoding, and all tools and libraries handling with it are written with that assumption in mind. > An the kerne does not know which charset the directory > entries are in. Or does the kernel know that for > the dos/win fs? No, you have to told it (the "codepage" mount option). For vfat (win95 and up) and for joliet CDs, it does know, as those fs only accept one and only one encoding. > If so you could also have kernel > knowledge of native fs charset. And wrapper APIs from > userland could convert to the fs charset from a locale/ > charset of the running program before calling the actual > kernel API. See above. It is technically possible. But it won't be accepted socially. > I think we may be heading for a mess if we dont do something > like that. There *will be* a mess, indeed. What will more likely happen, is that Linux distributions will include a special conversion tool so that when the switch to utf-8 is done, the filenames can be converted (you tell the tool your old encoding). Good tools will be able to work on directories and files instead of whole partitions, allowing for different possible legacy encodings for differents directories under /home Good tolls will also be able to not convert a filename that already is a valid utf-8 sequence. However, it will be even more complex than that: some file names may be referenced in config files and such, and converting the file name may break the programs... There will be a mess, there is no way to avoid it; we can only try to make it the least painfull possible. (and one thing in that direction is making anything new (config files for a new program, new protocols, etc etc) use utf-8 and only utf-8) -- Ki �a vos v�ye b�n, Pablo Saratxaga http://chanae.stben.be/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Italian or Portuguese]
msg03165/pgp00000.pgp
Description: PGP signature
