On Fri, Mar 30, 2007 at 11:46:12AM -0400, Rich Felker wrote: > What does “supports the encoding” mean? Applications cannot select the > locale they run in, aside from requesting the “C” or “POSIX” locale.
This isn't so. First of all, see the manual page setlocale(3), as well as the documentation of newlocale() and uselocale() and *_l() functions (no man page for them, use google). These will show you how to switch to arbitrary existing locale, no matter what your environment variables are. Second, in order to perform charset conversion, you don't need locales at all, you only need the iconv_open(3) and iconv(3) library calls. Yes, glibc provides a function to convert between two arbitrary character sets, even if the locale in effect uses a third, different charset. > It’s the decision of the user and/or the system implementor. In fact > it would be impossible to switch locales when visiting different pages > anyway. No, it's not impossible, and actually it's unneeded. Just for curiosity: I wrote a menu generator for our distribution. This loads the application menu from desktop files under /usr/share/applications, and outputs menu files for various window managers, such as IceWM, Window Maker, Enlightenment and so on. The input .desktop files contain the names of software in multiple languages. Simple window managers expect the menu file to contain them in only one language, the one you want to see. Hence this program outputs plenty of configuration file, one for each window manager and each language (icewm.en, icewm.hu, windowmaker.en, windowmaker.hu and so on). The entries are sorted alphabetically. But rules of alphabetical sorting differs from language to language. Hence I have to use many locales. Before dumping icewm.en, I have to switch to an English locale and perform sorting there. Before dumping icewm.hu, I need to activate the Hungarian locale. And so on. Earlier versions of this program even included UTF-8 -> 8-bit conversions (.desktop files use UTF-8 while our distro still used old-fashioned locale those early days) and this 8-bit charset yet again differed from language to language. So for example, when dumping icewm.fr, I converted the French descriptions to Latin1, but when dumping icewm.hu, it had to be converted to Latin2. In newer versions this part of the code is dropped since luckily UTF-8 is used in the generated file. Just in case you're interested, here's the source: ftp://ftp.uhulinux.hu/sources/uhu-menu/ > How would you deal with multiple browser windows or tabs, or even frames? I can't see any problem here. Can you? Browsers work correctly, don't they? You ask me how I'd implement a feature that _is_ implemented in basically any browser. I guess your browser handles frames and tabs with different charset correctly, doesn't it? Even if you run it with an 8-bit locale. One possible way is to convert each separate input stream (e.g. html page or frame) from their encoding to a common internal representation (most likely UTF-8). Technically there are some minor issues that make this more complicated (e.g. the charset info can be inside the html file), but theoretically there's absolutely no problem. > Normal implementations work either by converting all data to the > user’s encoding, or by converting it all to some representation of > Unicode (UTF-8 or UTF-32, or something nonstandard like UTF-21). Normal implementations work the 2nd way, that is, use a Unicode-compatible internal encoding. From the user's point of view there's only one difference between the two ways. Using the 1st way characters not present in your current locale are lost. Using the 2nd way they are kept and displayed correctly. Hence I still can't see any reason for choosing the 1st way (except for terminal applications that have to stick to the terminal charset). -- Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/