Re: wcsftime output encoding

Roger Leigh Sat, 27 Nov 2004 07:21:53 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roger Leigh <[EMAIL PROTECTED]> writes:


> Bruno Haible <[EMAIL PROTECTED]> writes:
>
>> Roger Leigh wrote:
>>> Viewed as hexadecimal (aligned for comparison):
>>> "Narrow" UTF-8:
>>>     Ð     Ñ     Ð     
>>> ==> d0 9f d1 82 d0 bd 
>>
>> In UCS-4 these would be
>>
>>       0000041F  00000442  0000043D
>>
>>> "Wide" (unknown):
>>>       B  =   
>>> ==> 1f 42 3d  
>>
>> So you can see that it simply used the low 8 bit of every UCS-4 character.
>> Which is broken. Before reporting this as a bug to the GCC people, you
>> might want to find out whether it's a bug in std::wcsftime or a bug in
>> the std::wcout stream.
>
> I've written a plain [C99] test program, below.  This works fine with
> wcsftime() and wfprintf().  So, I guess the problem is with "wcout".
>
> If there's a problem like fwide() with iostreams (you can't use
> fwprintf and fprintf with the same stream, or change the byte/wide
> mode once set), perhaps you can't output to both std::cout and
> wtd::wcout?  (This appears to be the case; after removing all the
> std::cout usage in the previous testcase, std::wcout appears to also
> output UTF-8).

- From what I've found since, you can't mix std::cout and std::wcout at
all, so it's all working correctly as specified in the C and C++
standards.  It's a bit worrying if a library you make use of will use
the wrong width, though: you'll never see its output, or it will be
badly corrupted.

One question arising from this is how to use gettext in this
environment.  For example:

#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <libintl.h>

int main(void)
{
  setlocale(LC_ALL, "");

  const char *narrow = N_("Test Unicode (narrow): ÃÃÃ ÐÐÑ!\n");
  fprintf(stdout, "%s\n", gettext(narrow));

  if (fwide (stderr, 1) <= 0)
    fprintf(stdout, "Failed to set stderr to wide orientation\n");

  const wchar_t *wide = N_(L"Test Unicode (wide): ÃÃÃ ÐÐÑ!\n");
  fwprintf(stderr, L"\n%ls\n", gettext(wide));

  fwprintf(stderr, L"\nNarrow-to-wide: %s\n", narrow);

  fprintf(stdout, "\nWide-to-narrow: %ls\n", wide);

  return 0;
}

If I compile this [C99] source with GCC 3.4, the narrow string is
encoded as UTF-8, and the wide string as UCS-4.  If I tell xgettext to
read the file as UTF-8, it correctly pulls out the strings into the
potfile.  However, the latter wide gettext call would fail, since
there's no corresponding wide string in the message catalogue.  I
guess I can't give gettext wide strings as input, but can I get them
as output?

In this case, I guess it wouldn't be a problem, since I could do

fwprintf(stderr, "%s", gettext(string));

and the UTF-8 string returned should be widened to UCS-4 automatically
and then turned into UTF-8 for actual output (perhaps this is
optimised away?).  However, for platforms that aren't using
UTF-8/UCS-4 externally/internally, or are using an old encoding, might
I need to do some stuff with iconv to make this work?  Or will this
always be taken care of automatically?


Thanks,
Roger

- -- 
Roger Leigh
                Printing on GNU/Linux?  http://gimp-print.sourceforge.net/
                Debian GNU/Linux        http://www.debian.org/
                GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBqJtcVcFcaSW/uEgRAlqrAJ9HunCvTFwloaGmA00BV+8fiAimZACcDHYW
2MizzP1KUBiQ1FbhTJ4jd4c=
=fQkf
-----END PGP SIGNATURE-----

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: wcsftime output encoding

Reply via email to