On Mon, Oct 02, 2000 at 15:32:20 +0200, Bruno Haible wrote:
> Byrial Jensen writes:
>
> > @@ -160,6 +170,19 @@ int main(int argc, char **argv)
> > #ifdef ENABLE_NLS
> > bindtextdomain(PACKAGE, LOCALEDIR);
> > textdomain(PACKAGE);
> > +#ifdef HAVE_BIND_TEXTDOMAIN_CODESET
> > + /*
> > + * GNU libc 2.2 will convert all translated messages from gettext()
> > + * to what it thinks is the current output character set. The default
> > + * depends on the LC_CTYPE locale, but we cannot permanently set this
> > + * as it would affect all isXXXXX() calls all over the program --
> > + * so we have to bind the default charset to the right value instead.
> > + */
> > + setlocale (LC_CTYPE, "");
> > + bind_textdomain_codeset (PACKAGE, nl_langinfo(CODESET));
> > + bind_textdomain_codeset ("libc", nl_langinfo(CODESET));
> > + setlocale (LC_CTYPE, "C");
> > +#endif
> > #endif
>
> This will nearly work. But not completely, because glibc's gettext function
> needs the LC_CTYPE locale for the codeset _and_ for the transliteration.
> You are only setting the codeset.
It would make sense to me if the language of the text influences
the transliteration, but I don't understand why or how the LC_CTYPE
locale which determines receiving codeset, influences it.
But my test programs confirms that it does. Would anyone please
explain to me what happens in follwing example?
The Danish letter "�" is translitterated to "aa" when LC_CTYPE is
"C" and to "ae" when LC_CTYPE is "da_DK". As a Dane I would say
that "aa" always is the correct ASCII translitteration of "�" so I
really don't understand what is happening.
$ cat loktest4.c
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <libintl.h>
int main (int argc, char *argv[])
{
if (argc != 2)
{
printf ("Usage: %s CODESET\n", argv[0]);
return 1;
}
setlocale (LC_CTYPE, "");
printf ("LC_CTYPE locale is \"%s\"\n", setlocale (LC_CTYPE, NULL));
bind_textdomain_codeset ("libc", argv[1]);
printf ("Textdomain \"%s\" is set to codeset \"%s\"\n", "libc", argv[1]);
printf ("strerror (ENOENT) = \"%s\"\n", strerror (ENOENT));
printf ("\n");
return 0;
}
$ gcc -Wall -o loktest4 -static loktest4.c
$ env -i ./loktest4 ASCII
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "No such file or directory"
$ env -i LANG=da ./loktest4 iso-8859-1
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "iso-8859-1"
strerror (ENOENT) = "Ingen s�dan fil eller filkatalog"
$ env -i LANG=da ./loktest4 ASCII
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "Ingen saadan fil eller filkatalog"
$ env -i LANG=da LC_CTYPE=da_DK ./loktest4 ASCII
LC_CTYPE locale is "da_DK"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "Ingen saedan fil eller filkatalog"
$
> Moreover, nl_langinfo is not completely portable. bind_textdomain_codeset
> will also be contained in the next standalone gettext package, thus
> HAVE_BIND_TEXTDOMAIN_CODESET will be true even on old platforms with gettext,
> and your code won't compile.
>
> I would therefore favour the opposite approach: Simply use
>
> setlocale (LC_CTYPE, "");
>
> and simulate the isXXXXX() calls with substitutes specific to the C locale.
> Take for example the files [1] and [2], specially optimized for the C locale.
Thanks for the advice. I will consider it, but I would be happier
with a solution which doesn't change code all over the program if
it can be done in a correct way.
--
Byrial
http://home.worldonline.dk/~byrial/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/