Re: Wide character APIs

Bruno Haible Thu, 03 Jul 2003 04:04:12 -0700

Michael B Allen said:
> Since Win32 is one of my target systems I need wide character support.


But Win32 doesn't have reasonable wide characters. They have a 16-bit
type called 'wchar_t' which cannot accomodate all characters since
Unicode 3.1. So what they will likely end up doing is to use UTF-16
as an encoding for 'wchar_t *' strings, which means that wchar_t doesn't
represent a *character* any more - it represents an UTF-16 memory unit.

> Is there a serious flaw with wchar_t on Linux?

wchar_t by itself is OK on Linux (it's 32-bit wide). But the functions
fgetwc() and fgetws() - as specified by ISO C 99 and POSIX:2001 - have a
big drawback: When you use them, and the input stream/file is not in the
expected encoding, you have no way to determine the invalid byte sequence
and do some corrective action. Using these functions has the effect that
your program becomes

     garbage in - more garbage out
or
     garbage in - abort

You need to use multibyte strings in order to get some decent program
behaviour in the presence of invalid multibyte contents of streams/files.

Bruno

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Wide character APIs

Reply via email to