-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Roger Leigh <[EMAIL PROTECTED]> writes:
> Bruno Haible <[EMAIL PROTECTED]> writes:
>
>> Roger Leigh wrote:
>>> Viewed as hexadecimal (aligned for comparison):
>>> "Narrow" UTF-8:
>>> Ð Ñ Ð
>>> ==> d0 9f d1 82 d0 bd
>>
>> In UCS-4 these would be
>>
>> 0000041F 00000442 0000043D
>>
>>> "Wide" (unknown):
>>> B =
>>> ==> 1f 42 3d
>>
>> So you can see that it simply used the low 8 bit of every UCS-4 character.
>> Which is broken. Before reporting this as a bug to the GCC people, you
>> might want to find out whether it's a bug in std::wcsftime or a bug in
>> the std::wcout stream.
>
> I've written a plain [C99] test program, below. This works fine with
> wcsftime() and wfprintf(). So, I guess the problem is with "wcout".
>
> If there's a problem like fwide() with iostreams (you can't use
> fwprintf and fprintf with the same stream, or change the byte/wide
> mode once set), perhaps you can't output to both std::cout and
> wtd::wcout? (This appears to be the case; after removing all the
> std::cout usage in the previous testcase, std::wcout appears to also
> output UTF-8).
- From what I've found since, you can't mix std::cout and std::wcout at
all, so it's all working correctly as specified in the C and C++
standards. It's a bit worrying if a library you make use of will use
the wrong width, though: you'll never see its output, or it will be
badly corrupted.
One question arising from this is how to use gettext in this
environment. For example:
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <libintl.h>
int main(void)
{
setlocale(LC_ALL, "");
const char *narrow = N_("Test Unicode (narrow): ÃÃÃ ÐÐÑ!\n");
fprintf(stdout, "%s\n", gettext(narrow));
if (fwide (stderr, 1) <= 0)
fprintf(stdout, "Failed to set stderr to wide orientation\n");
const wchar_t *wide = N_(L"Test Unicode (wide): ÃÃÃ ÐÐÑ!\n");
fwprintf(stderr, L"\n%ls\n", gettext(wide));
fwprintf(stderr, L"\nNarrow-to-wide: %s\n", narrow);
fprintf(stdout, "\nWide-to-narrow: %ls\n", wide);
return 0;
}
If I compile this [C99] source with GCC 3.4, the narrow string is
encoded as UTF-8, and the wide string as UCS-4. If I tell xgettext to
read the file as UTF-8, it correctly pulls out the strings into the
potfile. However, the latter wide gettext call would fail, since
there's no corresponding wide string in the message catalogue. I
guess I can't give gettext wide strings as input, but can I get them
as output?
In this case, I guess it wouldn't be a problem, since I could do
fwprintf(stderr, "%s", gettext(string));
and the UTF-8 string returned should be widened to UCS-4 automatically
and then turned into UTF-8 for actual output (perhaps this is
optimised away?). However, for platforms that aren't using
UTF-8/UCS-4 externally/internally, or are using an old encoding, might
I need to do some stuff with iconv to make this work? Or will this
always be taken care of automatically?
Thanks,
Roger
- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iD8DBQFBqJtcVcFcaSW/uEgRAlqrAJ9HunCvTFwloaGmA00BV+8fiAimZACcDHYW
2MizzP1KUBiQ1FbhTJ4jd4c=
=fQkf
-----END PGP SIGNATURE-----
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/