On Fri, 7 Sep 2001, Michael B. Allen wrote:
> On Fri, Sep 07, 2001 at 10:21:14PM +0200, Bruno Haible wrote:
> > > I gather that I can only assume that wchar_t is just a sequence of UCS
> > > codes of sizeof(wchar_t) in size.

You can assume that only if the macro __STDC_ISO_10646__ is defined by the
C compiler. Under Linux, this is the case starting with glibc 2.2.

> Well, I have to normalize to something!

Use iconv to convert to UTF-8 or UTF-16 before you write into data streams
that other programs than yours have to read in a locale-independent way.

> So if I get a string from the host environment (as opposed to a binary
> file format or network message), I use the locale dependant encoding
> unless otherwise instructed by a particular library.

Correct.

> But now, how do I find out what that encoding is?

Read

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate

> If I do:
>
>   printf("%s\n", nl_langinfo(CODESET));
>
> I get:
>
>   ANSI_X3.4-1968
>
> but locale charmap reports ISO-8859-1.

Most likely, you forgot to tell the C library to initialize the locale.
Add at the start of your program something like

  if (!setlocale(LC_CTYPE, "")) {
    fprintf(stderr, "Can't set the specified locale! "
            "Check LANG, LC_CTYPE, LC_ALL.\n");
    return 1;
  }

If you are only interested in the charset aspects of the locale,
I recommend to use only setlocale(LC_CTYPE, "") instead of
setlocale(LC_ALL, ""), because the former is far more efficient.
setlocale(LC_ALL, "") causes the C library to load lots of files from
/usr/lib/locale, one for each locale category, and if you are not
interested in locale dependent date/time/money/message formatting
or sorting, loading only the LC_CTYPE part clutters strace
output much less.

> Err, not with RH 6.2 glibc-2.1.3-15.

Any Linux user/developper interested in locales and character sets
is today *strongly* recommended to upgrade to a glibc 2.2 based
distribution. There have been huge improvements between 2.1 and 2.2!

> You're freshmeat link:
>
> http://clisp.cons.org/~haible/packages-libiconv.html
>
> is borken.

In fact, all of

  http://clisp.cons.org/~haible/

has vanished ... :-(  Bruno, where art thou?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to