portability in utf-8 -> wchar_t conversion

Daniel Resare Fri, 06 Apr 2001 04:34:02 -0700
I need to write a program that reads information from a file in UTF-8
and display it according to the current locale in a highly portable
fashion. To my understanding there are three ways to do this, and I would
be delighted to get some input on which one is the most portable and
flexible.

1) use setlocale() to set a UTF-8 locale and then use mbsrtowcs() to
convert the string to wchar_t[] and print out with wprintf().
Problems:
* To determine a locale (if any) that is UTF-8 enabledand.
* other threads using LC_CTYPE dependant functions might break.

2) convert the file input to wchar_t using iconv() and print out with
wprintf().
Problems:
* To my understanding there is not much specified about the wchar_t type,
  so a program converting to it would need to make some assumptions that
  might not be very portable. (I.e. casting an UCS-4 char* to wchar_t* will
  work) The __STDC_ISO_10646__ macro can be of some help when detecting truly
  wicked systems, but no robust solution seems to exist.

3) convert the file input directly to the output charset as found out by
querying OUTPUT_CHARSET and nl_langinfo(CODESET) and write it out using
standard printf().
Problems:
* Is OUTPUT_CHARSET a gnu extension, or part of some standard? I can't find
  any documentation about that.
* You loose all useful wchar.h functions in libc.

Any thoughts on this subject will be greatly appreciated.
/daniel

-- 
nuclear cia fbi spy password code president bomb
8D97 F297 CA0D 8751 D8EB  12B6 6EA6 727F 9B8D EC2A
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
portability in utf-8 -> wchar_t conversion

Reply via email to