Magnificent - thank you - that sorts everything out! It certainly seems safer to assume that server filesystems use UTF8 (and to encourage administrators to set up their systems appropriately), especially since (a) filenames with accented characters will be rare anyway and (b) "R�ve" mangled to "Rêve" doesn't break anything, it just looks a little odd.
Sorry I can't thank you in Basque... - Martin. At 12:07 19/09/02 +0200, Pablo Saratxaga wrote: >Kaixo! > >On Thu, Sep 19, 2002 at 08:25:16AM +0100, Martin Kochanski wrote: >> Thank you for your response.... >> >> Is a user able to change locales without rebuilding the filesystem? > >Yes, of course. >However, file names will still be encoded the way tey were the first time; >the names are not magically converted if you change your locale. >That is not possible because the filesystems used in Linux have no information >about the used encoding; the joliet and vfat ones, however, tell that info >(they are *always* in unicode (utf-16 I think)) so it is easy to mount them >in a way they are converted to the encoding you want (see the "mount" >man page). > >On the other hand, for a client/server system, it is much easier: make it >*mandatory* in the protocol to use utf-8 for the communication between >the client and the server. > >Then, also encode file names in utf-8 in Unix filesystems, regardless of >the locale; the file names may look strange when using "ls" or similar >in the command line under non-UTF-8 locales, but: >1) the normal thing for a client/server system is to get the info from the > client; what is done trough the system i nthe server is irrelevant. >2) UTF-8 is the future, all newly designed client/server protocols should > use it by default. Take also in account that it is highly probable that > the whole host system will switch to utf-8 sometime in the future, > so storing files in utf-8 just make it simpler for you. > >> If so, then if a user changes locales (for example from Latin-1 to UTF8), >> does this mean that all existing filenames [with accented letters] suddenly >> become undisplayable because what was valid as a Latin-1 string is no >> longer valid as UTF8; > >Yes. >(well, "ls" I think shows escaped sequences, allowing to still manipulate >the files. Gnome2 however simply skips filenames with malformed utf-8, >making those files impossible to handle :-( ) > >> or is there some more persistent concept of "the locale of a filesystem" >> that protects against this problem? > >No, that concept doesn't exist. >It is up to the different programs to handle it, if needed. > >"mount" can tell to the kernel, when mounting some filesystems that *do* >specify the encoding (or that only use one and only one encoding) how to >convert in order to have it display correctly for the current locale. >In a full utf-8 system you won't use that, but mount everything with "utf8" >option. > >If I were in your case, I woulkd use utf-8 exclusively in the server side, >if it were a dedicated server, then I would set it to use UTF-8 by default >for everything. > >We are still in a transition period, due to some (mainly command line) >tools still not cleanly supporting utf-8, and to the big amount of >data in legacy encodings, but in one or maybe two years it will likely >become the default encoding used. > >-- >Ki �a vos v�ye b�n, >Pablo Saratxaga > >http://chanae.stben.be/pablo/ PGP Key available, key ID: 0xD9B85466 >[you can write me in Walloon, Spanish, French, English, Italian or Portuguese] > >Attachment Converted: "C:\EASYNET\EUDORA\ATTACH\Re Linux and UTF8 filenames" > -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
