On Mon, Oct 02, 2000 at 15:32:20 +0200, Bruno Haible wrote:
> Byrial Jensen writes:
> 
> > @@ -160,6 +170,19 @@ int main(int argc, char **argv)
> >  #ifdef ENABLE_NLS
> >      bindtextdomain(PACKAGE, LOCALEDIR);
> >      textdomain(PACKAGE);
> > +#ifdef HAVE_BIND_TEXTDOMAIN_CODESET
> > +   /*
> > +    * GNU libc 2.2 will convert all translated messages from gettext()
> > +    * to what it thinks is the current output character set. The default
> > +    * depends on the LC_CTYPE locale, but we cannot permanently set this
> > +    * as it would affect all isXXXXX() calls all over the program --
> > +    * so we have to bind the default charset to the right value instead.
> > +    */
> > +    setlocale (LC_CTYPE, "");
> > +    bind_textdomain_codeset (PACKAGE, nl_langinfo(CODESET));
> > +    bind_textdomain_codeset ("libc", nl_langinfo(CODESET));
> > +    setlocale (LC_CTYPE, "C");
> > +#endif
> >  #endif
> 
> This will nearly work. But not completely, because glibc's gettext function
> needs the LC_CTYPE locale for the codeset _and_ for the transliteration.
> You are only setting the codeset.

It would make sense to me if the language of the text influences
the transliteration, but I don't understand why or how the LC_CTYPE
locale which determines receiving codeset, influences it.

But my test programs confirms that it does. Would anyone please
explain to me what happens in follwing example?

The Danish letter "�" is translitterated to "aa" when LC_CTYPE is
"C" and to "ae" when LC_CTYPE is "da_DK". As a Dane I would say
that "aa" always is the correct ASCII translitteration of "�" so I
really don't understand what is happening.

$ cat loktest4.c
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <libintl.h>

int main (int argc, char *argv[])
{
  if (argc != 2)
  {
    printf ("Usage: %s  CODESET\n", argv[0]);
    return 1;
  }

  setlocale (LC_CTYPE, "");
  printf ("LC_CTYPE locale is \"%s\"\n", setlocale (LC_CTYPE, NULL));

  bind_textdomain_codeset ("libc", argv[1]);
  printf ("Textdomain \"%s\" is set to codeset \"%s\"\n", "libc", argv[1]);

  printf ("strerror (ENOENT) = \"%s\"\n", strerror (ENOENT));
  printf ("\n");
  return 0;
}
$ gcc -Wall -o loktest4 -static loktest4.c
$ env -i ./loktest4 ASCII
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "No such file or directory"

$ env -i LANG=da ./loktest4 iso-8859-1
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "iso-8859-1"
strerror (ENOENT) = "Ingen s�dan fil eller filkatalog"

$ env -i LANG=da ./loktest4 ASCII
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "Ingen saadan fil eller filkatalog"

$ env -i LANG=da LC_CTYPE=da_DK ./loktest4 ASCII
LC_CTYPE locale is "da_DK"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "Ingen saedan fil eller filkatalog"

$


> Moreover, nl_langinfo is not completely portable. bind_textdomain_codeset
> will also be contained in the next standalone gettext package, thus
> HAVE_BIND_TEXTDOMAIN_CODESET will be true even on old platforms with gettext,
> and your code won't compile.
> 
> I would therefore favour the opposite approach: Simply use
> 
>       setlocale (LC_CTYPE, "");
> 
> and simulate the isXXXXX() calls with substitutes specific to the C locale.
> Take for example the files [1] and [2], specially optimized for the C locale.

Thanks for the advice. I will consider it, but I would be happier
with a solution which doesn't change code all over the program if
it can be done in a correct way.

-- 
Byrial
http://home.worldonline.dk/~byrial/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to