On Thu, 3 Jul 2003, Bruno Haible wrote: > Michael B Allen said: > > Since Win32 is one of my target systems I need wide character support. > > But Win32 doesn't have reasonable wide characters. They have a 16-bit > type called 'wchar_t' which cannot accomodate all characters since > Unicode 3.1. So what they will likely end up doing is to use UTF-16 > as an encoding for 'wchar_t *' strings, which means that wchar_t doesn't > represent a *character* any more - it represents an UTF-16 memory unit.
Interesting. I didn't know wchar_t was supposed to be able to represent an entire character. I thought it could be anything. I don't see how this is any different from UTF-8. If I code using the rules for UTF-8 then I bet my code will behave more correctly than most folks Win32 _UNICODE stuff that doesn't consider serrogates. Is there a hitch I'm not seeing? > > Is there a serious flaw with wchar_t on Linux? > > wchar_t by itself is OK on Linux (it's 32-bit wide). But the functions > fgetwc() and fgetws() - as specified by ISO C 99 and POSIX:2001 - have a > big drawback: When you use them, and the input stream/file is not in the > expected encoding, you have no way to determine the invalid byte sequence > and do some corrective action. Using these functions has the effect that > your program becomes > > garbage in - more garbage out > or > garbage in - abort > > You need to use multibyte strings in order to get some decent program > behaviour in the presence of invalid multibyte contents of streams/files. This is good to know. I have been avoiding those functions and converting to/from the locale encoding internally using mbstowc and wctombs. I guess I'll just keep doing that. The other personal debate I'm having is wheather or not to make filenames the tchar * type. I would have to convert to the locale encoding for that too. But no one answered my original question; why are the format specifiers for wide character functions different? Mike -- A program should be written to model the concepts of the task it performs rather than the physical world or a process because this maximizes the potential for it to be applied to tasks that are conceptually similar and, more important, to tasks that have not yet been conceived. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
