Re: Linux and UTF8 filenames

Keld Jďż˝rn Simonsen Thu, 19 Sep 2002 15:21:26 -0700

On Thu, Sep 19, 2002 at 02:55:09PM +0200, Pablo Saratxaga wrote:
> Kaixo!
> 
> On Thu, Sep 19, 2002 at 02:28:13PM +0200, Keld Jďż˝rn Simonsen wrote:
> > I know that for DOS/windows file systems you can set a charset 
> > for the global system. Maybe yo should also be able to
> > do that for the native linux systems on a filesystem wide
> > basis. 
> 
> No, you cannot.


I know you cannot do it today, but I am discussing how to 
make the transition to utf-8 in Linux native fs.

> For DOS/win3 you can do it because they are mono-user.

you can have multiple users in dos/win. and you may even
have one user using different charsets, such as cp437,
cp850, cp865, cp1252 and ucs-2. I know such a user very well...



> Note also that DOS/win3 *don't* set any charset in the file system, they
> jsut use one, without any special information (exactly like Linux does
> in fact; but Linux being multi-user it may happen that two users use
> different encodings...).
> 
> Win95 and up *do* set a charset for the filesystem, always the same (it is
> utf-16 I think); knowing it is always the same makes it possible to
> convert it, if needed.

I was talking about the Linux kernel's treatment of dos/win fs.
VFAT has a codepage parameter. This is done in the kernel.
If it can be done for one filesystem type, it could also be done for
other fs types. But anyway it might be better to do a conversion
to utf-8 - at least for the file names.

> > If so you could also have kernel
> > knowledge of native fs charset. And wrapper APIs from
> > userland could convert to the fs charset from a locale/
> > charset of the running program before calling the actual 
> > kernel API. 
> 
> See above.
> It is technically possible.
> But it won't be accepted socially.

Why not? The codepage parameter is accepted socially, and
all programs work on these filesystems as well (almost) as
on native Linux fs. I can understand that changing APIs are
not a good way forward, but the "one-charset-per-fs" could
be done. And I could easly imagine scenarios in my ballpark
with different users on the same multiuser linux machine not
wanting to change over to utf-8 at the same time. Or programs in use
that will not easily convert to utf8 and thus has to still
run some other encoding. 

> > I think we may be heading for a mess if we dont do something 
> > like that.
> 
> There *will be* a mess, indeed.
> 
> What will more likely happen, is that Linux distributions will include
> a special conversion tool so that when the switch to utf-8 is done,
> the filenames can be converted (you tell the tool your old encoding).
> Good tools will be able to work on directories and files instead of whole
> partitions, allowing for different possible legacy encodings for differents
> directories under /home
> Good tolls will also be able to not convert a filename that already is
> a valid utf-8 sequence.
> 
> However, it will be even more complex than that: some file names may be
> referenced in config files and such, and converting the file name may break
> the programs...
> 
> There will be a mess, there is no way to avoid it; we can only try to make
> it the least painfull possible.
> (and one thing in that direction is making anything new (config files for
> a new program, new protocols, etc etc) use utf-8 and only utf-8)

Agree. To me it is also more how then to minimize the damage.
That is probably one of the main issues on this email list.

Best regards
keld
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Linux and UTF8 filenames

Reply via email to