On Mon, Aug 10, 2009 at 09:04:37PM +0100, Roger Leigh wrote: > If having a C.UTF-8 locale always available for system services is > required for them to fully support UTF-8, then that needs adding to > glibc.
It would also bring significant speed increase. Since about everything calls setlocale(), having the locale internal speeds up the typical process startup sequence by 20%! And that's 20% of the whole thing from fork(), through link, up to getopt(), so it's not a speedup you can shake a stick at. I'm speaking about having the locale supported natively by glibc, of course; what the udeb does is merely shipping a generated locale file. > For a locale available after /usr is mounted, a simple localedef > invocation is all that's needed; for all times, after starting init, > it needs the tables compiling into glibc as for the standard C locale. > I've been looking at how to do the latter, but I'm not expert with the > "3-level" locale tables and other glibc internals, so if anyone who > knows the details of glibc locales could provide me with > assistance/guidance here, that would be much appreciated. > > For reference, this is bug #522776. This would be great to have as a > release goal for Squeeze, and (speculatively) a native C UTF-8 locale > for Squeeze+1 to give us a default pure UTF-8 system from end-to-end. I'm not an expert with glibc internals too, but a couple of years ago I researched the issue a bit. Apparently, there are only two first-class locales: C and POSIX, all other get loaded from the disk. In the past, en_US.ISO-8859-1 and ru_RU.KOI8-R were such first-class ones as well, but that's no more. What I'd propose would be making C.UTF-8 built in. Another possible optimization would be building the table used by 8-bit isalpha/etc on the fly for all locales. Iconving 128 characters is certainly faster than opening a file on the disk, and (sanely) glibc doesn't support character classification contrary to Unicode so this could result in completely nuking all LC_CTYPE files for other locales as well. -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org