Re: Wide character APIs

Michael B Allen Thu, 03 Jul 2003 11:36:01 -0700

On Thu, 3 Jul 2003, Bruno Haible wrote:

> Michael B Allen said:
> > Since Win32 is one of my target systems I need wide character support.
> 
> But Win32 doesn't have reasonable wide characters. They have a 16-bit
> type called 'wchar_t' which cannot accomodate all characters since
> Unicode 3.1. So what they will likely end up doing is to use UTF-16
> as an encoding for 'wchar_t *' strings, which means that wchar_t doesn't
> represent a *character* any more - it represents an UTF-16 memory unit.


Interesting. I didn't know wchar_t was supposed to be able to represent
an entire character. I thought it could be anything. I don't see how this
is any different from UTF-8. If I code using the rules for UTF-8 then
I bet my code will behave more correctly than most folks Win32 _UNICODE
stuff that doesn't consider serrogates. Is there a hitch I'm not seeing?

> > Is there a serious flaw with wchar_t on Linux?
> 
> wchar_t by itself is OK on Linux (it's 32-bit wide). But the functions
> fgetwc() and fgetws() - as specified by ISO C 99 and POSIX:2001 - have a
> big drawback: When you use them, and the input stream/file is not in the
> expected encoding, you have no way to determine the invalid byte sequence
> and do some corrective action. Using these functions has the effect that
> your program becomes
> 
>      garbage in - more garbage out
> or
>      garbage in - abort
> 
> You need to use multibyte strings in order to get some decent program
> behaviour in the presence of invalid multibyte contents of streams/files.

This is good to know. I have been avoiding those functions and converting
to/from the locale encoding internally using mbstowc and wctombs. I
guess I'll just keep doing that. The other personal debate I'm having
is wheather or not to make filenames the tchar * type. I would have to
convert to the locale encoding for that too.

But no one answered my original question; why are the format specifiers
for wide character functions different?

Mike

-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived. 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Wide character APIs

Reply via email to