Re: Encoding conversions

Michael B. Allen Fri, 07 Sep 2001 14:21:02 -0700
On Fri, Sep 07, 2001 at 10:21:14PM +0200, Bruno Haible wrote:
> > I gather that I can only assume that wchar_t is just a sequence of UCS
> > codes of sizeof(wchar_t) in size.
> 
> You cannot even assume that. wchar_t is locale dependent and
> OS/compiler/vendor dependent. It should never be used for "binary file
> formats and network messages".

Well, I have to normalize to something!

> > But is the in memory representation
> > of a multi-byte string the equivalent of the UTF-8 encoding
> 
> Depends where you got the string. In most cases, like when you got it
> from fgets(stdin), it will be in locale dependent encoding (LC_CTYPE

So if I get a string from the host environment (as opposed to a binary
file format or network message), I use the locale dependant encoding
unless otherwise instructed by a particular library. But now, how do I
find out what that encoding is? If I do:

  printf("%s\n", nl_langinfo(CODESET));

I get:

  ANSI_X3.4-1968

but locale charmap reports ISO-8859-1.

> With the two aforementioned iconv implementations, you can also
> directly use  iconv_open("UTF-16LE","wchar_t").

Err, not with RH 6.2 glibc-2.1.3-15. You're freshmeat link:

http://clisp.cons.org/~haible/packages-libiconv.html

is borken. Can I use the latest libiconv as a shared library ... ehh lot
of questions I guess I'll just wait for it to finish coming down my 56k
pipe and look at the docs.

Thanks Bruno, I think I'm getting the picture,
Mike

-- 
Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: Encoding conversions

Reply via email to