On Fri, Mar 30, 2007 at 02:04:14PM -0400, Rich Felker wrote: Hi,
> As you’ll see, the only arguments with which you can portably call > setlocale are NULL, "", "C", "POSIX", and perhaps also a string > previously returned by setlocale. You can portably _call_ setlocale() with any argument, as long as you check its return value and properly handle if it failed to fulfill your request. The arguments you listed are probably those for which you can always assume setlocale() to succeed. In the other cases you still might give it a chance and see whether it succeeds. > I’m interested only in portable applications, not “GNU/Linux > applications”. Our goals differ. Since I'm developing a Linux distro, I'm only interested in developing GNU/Linux applications. We don't have any resources to check the portability of our applications, neither want to make our job harder by working with only a subset of the available functions and re-implement what's already implemented in glibc. I don't think newer features that get implemented in glibc are only to make its size bigger. I think they are for the developers to use them when appropriate. They might not be appropriate for a portable application, but usually are apropriate for our goals. > > the documentation of newlocale() and uselocale() and *_l() functions > > These are nonstandard extensions and are a horrible mistake in design > direction. Having the character encoding even be selectable at runtime > is partly a mistake, and should be seen as a temporary measure during > the adoption of UTF-8 to allow legacy apps to continue working until > they can be fixed. No, first of all, they are not about multiple encodings, but multiple locales. (It seems to me that you slightly mix up locale and encoding. Encoding is only a part of locale and can be used independently of them.) For example, if you create a German-French dictionary application, it's expectable that German strings are sorted according to the German alphabet rules, while French words are sorted using the French rules. Even if your operating system doesn't support these locales, it might be a reasonable decision if the application tried these locales and fell back to a default sorting if they weren't available. > If you look into the GNU *_l() functions, the majority of them exist > primarily or only because of LC_CTYPE. It seems to me that the majority of them exist because of cultural differences, and there would be need for them if only UTF-8 existed. Different time/date formats, different alphabetical sorting, different lowercase-uppercase mapping etc. > Applications which need to deal with multinational cultural > expectations simultaneously probably need much stronger functionality > than the standard library provides anyway, and would do best to use > their own (possibly in library form) specialized machinery. So far the functionality provided by glibc were sufficient for me and I would have hated if I had to use an external library. ;) Anyway, it would really be a bad decision if glibc didn't provide a way to easily access the locale data that's originating from glibc and is already accessible via glibc if you set a corresponding locale. Then the external library you'd like to see would either need to access locale-data the same way as glibc does, or had to provide the same information on its own form again. Sounds terrible. External library is a good approach if some information cannot be extracted by glibc _at all_. For example, glibc doesn't know how many people live in Hungary, it's not part of the locale data. If you need it, you may pick up an external library that tells you this. However, glibc knows how to alphabetically sort Hungarian strings. You claim that it shouldn't let applications access this piece of information, unless they have their LANG/LC_* environment variables set to hu_HU or some variant of it. You say that applications should find a different way (different library, maybe different database) to access this data if they needed it even if the system locale was not Hungarian. This is totally absurd. > There is no standard on which character encodings should be > supported (which is a good thing, since eventually they can all be > dropped.. and even before then, non-CJK systems may wish to omit the > large tables for legacy CJK encodings), I don't think support for the current 8-bit encoding will die within the next 50 years, and (as an application developer) if the underlying operating system (its iconv() calls) doesn't support a particular encoding, I'd happily blame it on the OS and not think about workarounds. Practically this means that if I need to process data in a particular encoding, I pass this encoding to iconv_open() and cry out loud if it fails. You're right, I don't expect iconv() to support ISO-8859-1, but still, if I need, I try it, use it if availble, and print an error message otherwise. I won't implement it on my own, the application is not the right place to do it. > > file to contain them in only one language, the one you want to see. Hence > > this program outputs plenty of configuration file, one for each window > > manager and each language (icewm.en, icewm.hu, windowmaker.en, > > windowmaker.hu and so on). > > It would be nice if these apps would use some sort of message catalogs > for their menus, and if they would perform the sorting themselves at > runtime. Yes, that'd be a theoretically better solution, but would require much-much more work, would be less compatible with other distros, would be much harder to adopt new window managers... > You could use setlocale instead of the *_l() stuff so it would be > portable to non-glibc. If porting ever becomes an issue, I can still re-write it (with autoconf and compile-time conditionals). Using the *_l() functions made the code cleaner and probably faster. > For a normal user application I would say this > is an abuse of locales to begin with and that it should use its own > collation data tables, Own table? Why? What's the gain in shipping duplicated data? How are we supposed to create collation tables for all languages? Why do you think it's wrong if glibc allows access to these data and I use them? > but what you’re doing seems reasonable for a > system-specific maintainence script. The code looks nice. Clean use of > plain C without huge bloated frameworks. Thanks :) > > I can't see any problem here. Can you? Browsers work correctly, don't they? > > You ask me how I'd implement a feature that _is_ implemented in basically > > any browser. I guess your browser handles frames and tabs with different > > charset correctly, doesn't it? Even if you run it with an 8-bit locale. > > I meant you run into trouble if you were going to change locale for > each page. Obviously it works if you don’t use the locale system. Well of course I didn't mean changing the _locale_ either, just convert between _encodings_. > Links works the other way: converting everything to the selected > character encoding. Crappy versions of links (including the popular > gui one) only support 8bit codepages, but recent ELinks supports > UTF-8. I know mainstream version of links is crappy. I haven't checked elinks yet, I will do soon. Does it have a GUI version? In terminal, as I've said, it's okay if it converts everything to the locale version, since in terminal it's not possible to display out-of-default-locale's-charset characters. (Except for the \e%G magic...) If it _is_ possible for an application to display out-of-default-locale's-charset characters, IMO it _has_ to do so. > Also applications that want to interact with other applications on the > system expecting to receive text, e.g. an external text editor or > similar. They might convert back the data to the locale encoding before passing to the external application. It's no excuse for not displaying them if it's otherwise technically possible. -- Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/