Daniel Resare writes:
> I need to write a program that reads information from a file in UTF-8
> and display it according to the current locale in a highly portable
> fashion. To my understanding there are three ways to do this, and I would
> be delighted to get some input on which one is the most portable and
> flexible.
> 
> 1) use setlocale() to set a UTF-8 locale and then use mbsrtowcs() to
> convert the string to wchar_t[] and print out with wprintf().
> Problems:
> * To determine a locale (if any) that is UTF-8 enabled and.
> * other threads using LC_CTYPE dependant functions might break.

This is totally unportable, works only with glibc. Because between
converting the string to wchar_t[] and printing with wprintf() you'd
have to switch locale back to the original one (otherwise you could
equally well printf() the UTF-8 string). This switch makes all
wchar_t[] strings in memory invalid, because wchar_t is locale
dependent.

> 2) convert the file input to wchar_t using iconv() and print out with
> wprintf().
> Problems:
> * To my understanding there is not much specified about the wchar_t type,
>   so a program converting to it would need to make some assumptions that
>   might not be very portable. (I.e. casting an UCS-4 char* to wchar_t* will
>   work) The __STDC_ISO_10646__ macro can be of some help when detecting truly
>   wicked systems, but no robust solution seems to exist.

This is better but still not fully portable: Not all iconv
implementations can convert from/to "wchar_t" yet. Only glibc and
libiconv can.

> 3) convert the file input directly to the output charset as found out by
> querying OUTPUT_CHARSET and nl_langinfo(CODESET) and write it out using
> standard printf().

This is the most portable. Forget about OUTPUT_CHARSET, it's nowhere
documented. Only nl_langinfo(CODESET) is documented and standardized.
On platforms where nl_langinfo(CODESET) is not available, libiconv has
a substitute.

> Problems:
> * Is OUTPUT_CHARSET a gnu extension, or part of some standard?

OUTPUT_CHARSET is a glibc specific hack.

> * You loose all useful wchar.h functions in libc.

You can still access these functions, after using mbstowcs.

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to