-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A while back, I made the useful discovery that GCC accepts UTF-8
encoded C source by default, and in the generated object code uses
UTF-8 for narrow (char) strings, and UTF-32/UCS-4 for wide (wchar_t)
strings.

As an example:

#include <locale.h>
#include <stdio.h>

int
main (void)
{
  setlocale (LC_ALL, "");
  printf("‘Name’\n");
  return 0;
}

This then correctly outputs the quotes:

$ ./test
‘Name’

A better example is here:

http://groups-beta.google.com/group/comp.lang.c.moderated/msg/bb55bb9f835eba6a?hl=en

In this case, you can output wide strings to narrow streams, and
narrow strings to wide streams.  In order to be able to do this, I
assume that the C runtime must know something of the execution
charsets in order to do the conversion, otherwise you wouldn't get
readable output.  Additionally, when you output a wide string with
wprintf(), it must be recoded to the narrow representation for
output??.

The above link is wrong.  I thought that given the C runtime's
knowledge of the execution charsets, it would recode the output into
the locale charset.  This does not appear to be the case, however.
The above program works the same in the C locale as a normal UTF-8
locale.

Can anyone confirm if the above is correct, or point to anywhere this
is documented?


Thanks,
Roger

- -- 
Roger Leigh
                Printing on GNU/Linux?  http://gimp-print.sourceforge.net/
                Debian GNU/Linux        http://www.debian.org/
                GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFCufKJVcFcaSW/uEgRAo2aAKCIyvhXSOGHco9kgLQxK6d4jldEwwCfVESl
SreZmdI9Tl9wSXSncyq0rAM=
=8WxF
-----END PGP SIGNATURE-----

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to