On Fri, Sep 07, 2001 at 10:21:14PM +0200, Bruno Haible wrote:
> > I gather that I can only assume that wchar_t is just a sequence of UCS
> > codes of sizeof(wchar_t) in size.
>
> You cannot even assume that. wchar_t is locale dependent and
> OS/compiler/vendor dependent. It should never be used for "binary file
> formats and network messages".
Well, I have to normalize to something!
> > But is the in memory representation
> > of a multi-byte string the equivalent of the UTF-8 encoding
>
> Depends where you got the string. In most cases, like when you got it
> from fgets(stdin), it will be in locale dependent encoding (LC_CTYPE
So if I get a string from the host environment (as opposed to a binary
file format or network message), I use the locale dependant encoding
unless otherwise instructed by a particular library. But now, how do I
find out what that encoding is? If I do:
printf("%s\n", nl_langinfo(CODESET));
I get:
ANSI_X3.4-1968
but locale charmap reports ISO-8859-1.
> With the two aforementioned iconv implementations, you can also
> directly use iconv_open("UTF-16LE","wchar_t").
Err, not with RH 6.2 glibc-2.1.3-15. You're freshmeat link:
http://clisp.cons.org/~haible/packages-libiconv.html
is borken. Can I use the latest libiconv as a shared library ... ehh lot
of questions I guess I'll just wait for it to finish coming down my 56k
pipe and look at the docs.
Thanks Bruno, I think I'm getting the picture,
Mike
--
Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/