-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

I've been investigating i18n of time/date formatting with the standard
C and C++ libraries.  However, I've run into an issue with wide
character output.  The documentation doesn't elaborate on what
encoding is used (or even how to specify one), so I hope someone might
have an idea?

The program listed below demonstrates the use of wcsftime() and
std::time_put<wchar_t> which is a C++ wrapper around it.  I'm not sure
if this is a platform-dependent feature or part of the C standard.

I've compiled with GCC 3.4.3 on GNU/Linux, and run in an en_GB UTF-8
locale (note GCC 3.3.x won't work with wide streams at all well).  The
output looks like this:

$ ./date3
asctime:                Fri Nov 26 13:26:48 2004
strftime:               Fri 26 Nov 2004 13:26:48 GMT
wcsftime:               Fri 26 Nov 2004 13:26:48 GMT
std::time_put<char>:    Fri 26 Nov 2004 13:26:48 GMT
std::time_put<wchar_t>: Fri 26 Nov 2004 13:26:48 GMT

Everything worked.  It also works if I run in a different locale (all
locales use UTF-8 as their codeset):

$ LANG=de_DE LC_ALL=de_DE ./date3
asctime:                Fri Nov 26 13:28:03 2004
strftime:               Fr 26 Nov 2004 13:28:03 GMT
wcsftime:               Fr 26 Nov 2004 13:28:03 GMT
std::time_put<char>:    Fr 26 Nov 2004 13:28:03 GMT
std::time_put<wchar_t>: Fr 26 Nov 2004 13:28:03 GMT

$ LANG=pt_BR LC_ALL=pt_BR ./date3
asctime:                Fri Nov 26 13:29:18 2004
strftime:               Sex 26 Nov 2004 13:29:18 GMT
wcsftime:               Sex 26 Nov 2004 13:29:18 GMT
std::time_put<char>:    Sex 26 Nov 2004 13:29:18 GMT
std::time_put<wchar_t>: Sex 26 Nov 2004 13:29:18 GMT

However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime:                Fri Nov 26 13:30:08 2004
strftime:               ÐÑÐ 26 ÐÐÑ 2004 13:30:08
wcsftime:               ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>:    ÐÑÐ 26 ÐÐÑ 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08

Viewed as hexadecimal (aligned for comparison):
"Narrow" UTF-8:
    Ð     Ñ     Ð        2  6     Ð     Ð     Ñ
==> d0 9f d1 82 d0 bd 20 32 36 20 d0 9d d0 be d1 8f
    2  0  0  4     1  1  :  3  2  :  4  7  \n
==> 20 32 30 30 34 20 31 31 3a 35 31 3a 30 34 0a

"Wide" (unknown):
      B  =              2  6       >  O
==> 1f 42 3d          20 32 36 20 1d 3e 4f
    2  0  0  4     1  1  :  3  2  :  4  7  \n
==> 20 32 30 30 34 20 31 31 3a 35 31 3a 30 34 0a

                      


In this case the "narrow" and "wide" outputs differ.  The "narrow"
output is valid UTF-8, whereas the "wide" output is something
different entirely.  What encoding does wcsftime() use when outputting
characters outside the ASCII range?  UCS-4?  Something
implementation-defined?  I expected that both would result in readable
output; is this assumption incorrect?

My question is basically this: what is wcsftime() actually doing, and
how should I get printable output from the wide string it fills for
me?  If I'm plunging into the realms of non-portability by even
considering using wide characters, are they worth considering to use?

Also, can gettext be used with wide characters?


Many thanks,
Roger


#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>

int main()
{
  // Set up locale stuff...
  std::locale::global(std::locale(""));
  std::cout.imbue(std::locale());
  std::wcout.imbue(std::locale());

  // Get current time
  time_t simpletime = time(0);

  // Break down time.
  std::tm brokentime;
  localtime_r(&simpletime, &brokentime);

  // Normalise.
  mktime(&brokentime);

  std::cout << "asctime:                " << asctime(&brokentime);

  // Print with strftime(3)
  char buffer[40];
  std::strftime(&buffer[0], 40, "%c", &brokentime);

  std::cout << "strftime:               " << &buffer[0] << '\n';

  wchar_t wbuffer[40];
  std::wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
  std::wcout << L"wcsftime:               " << &wbuffer[0] << L'\n';

  // Try again, but use proper locale facets...
  const std::time_put<char>& tp =
    std::use_facet<std::time_put<char> >(std::cout.getloc());

  std::string pattern("std::time_put<char>:    %c\n");
  tp.put(std::cout, std::cout, std::cout.fill(),
         &brokentime, &*pattern.begin(), &*pattern.end());

  // And again, but using wchar_t...
  const std::time_put<wchar_t>& wtp =
    std::use_facet<std::time_put<wchar_t> >(std::wcout.getloc());

  std::wstring wpattern(L"std::time_put<wchar_t>: %c\n");
  wtp.put(std::wcout, std::wcout, std::wcout.fill(),
          &brokentime, &*wpattern.begin(), &*wpattern.end());

  return 0;
}

- -- 
Roger Leigh
                Printing on GNU/Linux?  http://gimp-print.sourceforge.net/
                Debian GNU/Linux        http://www.debian.org/
                GPG Public Key: 0x25BFB848.  Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBp2v9VcFcaSW/uEgRAnZmAKDqzHRWTNm1/MtVyFdV4DAz++eEbACeO9rf
2FXXuGqneJZCAXn/tNhi9lI=
=HW7D
-----END PGP SIGNATURE-----

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to