On Fri, Mar 30, 2007 at 07:06:52PM +0200, Egmont Koblinger wrote: > On Fri, Mar 30, 2007 at 11:46:12AM -0400, Rich Felker wrote: > > > What does “supports the encoding” mean? Applications cannot select the > > locale they run in, aside from requesting the “C” or “POSIX” locale. > > This isn't so. First of all, see the manual page setlocale(3), as well as
The documentation of setlocale is here: http://www.opengroup.org/onlinepubs/009695399/functions/setlocale.html As you’ll see, the only arguments with which you can portably call setlocale are NULL, "", "C", "POSIX", and perhaps also a string previously returned by setlocale. I’m interested only in portable applications, not “GNU/Linux applications”. > the documentation of newlocale() and uselocale() and *_l() functions (no man > page for them, use google). These will show you how to switch to arbitrary > existing locale, no matter what your environment variables are. These are nonstandard extensions and are a horrible mistake in design direction. Having the character encoding even be selectable at runtime is partly a mistake, and should be seen as a temporary measure during the adoption of UTF-8 to allow legacy apps to continue working until they can be fixed. In the future we should have much lighter, sleeker, more maintainable systems without runtime-selectable character encoding. If you look into the GNU *_l() functions, the majority of them exist primarily or only because of LC_CTYPE. The madness of having locally bindable locale would not be so mad if these could all be thrown out, and if only the ones that actually depend on cultural customs instead of on character encoding could be kept. However, I suspect even then it’s a mistake. Applications which just need to present data to the user in a form that’s comfortable to the user’s cultural expectations are fine with a single global locale. Applications which need to deal with multinational cultural expectations simultaneously probably need much stronger functionality than the standard library provides anyway, and would do best to use their own (possibly in library form) specialized machinery. > Second, in order to perform charset conversion, you don't need locales at > all, you only need the iconv_open(3) and iconv(3) library calls. Yes, glibc > provides a function to convert between two arbitrary character sets, even if > the locale in effect uses a third, different charset. Yes, I’m well aware. This is not specific to glibc but part of the standard. There is no standard on which character encodings should be supported (which is a good thing, since eventually they can all be dropped.. and even before then, non-CJK systems may wish to omit the large tables for legacy CJK encodings), nor on the names for the encodings (which is rather stupid; it would be very reasonable and practical for SUS to mandate that, if an encoding is supported, it must be supported under its standard preferred MIME name). The standard also does not necessarily guarantee a direct conversion from A to C, even if conversions from A to B and B to C exist. > file to contain them in only one language, the one you want to see. Hence > this program outputs plenty of configuration file, one for each window > manager and each language (icewm.en, icewm.hu, windowmaker.en, > windowmaker.hu and so on). It would be nice if these apps would use some sort of message catalogs for their menus, and if they would perform the sorting themselves at runtime. > Just in case you're interested, here's the source: > ftp://ftp.uhulinux.hu/sources/uhu-menu/ You could use setlocale instead of the *_l() stuff so it would be portable to non-glibc. For a normal user application I would say this is an abuse of locales to begin with and that it should use its own collation data tables, but what you’re doing seems reasonable for a system-specific maintainence script. The code looks nice. Clean use of plain C without huge bloated frameworks. > > How would you deal with multiple browser windows or tabs, or even frames? > > I can't see any problem here. Can you? Browsers work correctly, don't they? > You ask me how I'd implement a feature that _is_ implemented in basically > any browser. I guess your browser handles frames and tabs with different > charset correctly, doesn't it? Even if you run it with an 8-bit locale. I meant you run into trouble if you were going to change locale for each page. Obviously it works if you don’t use the locale system. > > Normal implementations work either by converting all data to the > > user’s encoding, or by converting it all to some representation of > > Unicode (UTF-8 or UTF-32, or something nonstandard like UTF-21). > > Normal implementations work the 2nd way, that is, use a Unicode-compatible > internal encoding. Links works the other way: converting everything to the selected character encoding. Crappy versions of links (including the popular gui one) only support 8bit codepages, but recent ELinks supports UTF-8. > From the user's point of view there's only one difference > between the two ways. Using the 1st way characters not present in your > current locale are lost. Using the 2nd way they are kept and displayed > correctly. Hence I still can't see any reason for choosing the 1st way > (except for terminal applications that have to stick to the terminal > charset). Also applications that want to interact with other applications on the system expecting to receive text, e.g. an external text editor or similar. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/