Re: UTF16 and GCC

Robert de Bath Wed, 08 Aug 2001 04:05:16 -0700
On 7 Aug 2001, Kai Henningsen wrote:

> [EMAIL PROTECTED] (Florian Weimer)  wrote on 05.08.01 in 
><[EMAIL PROTECTED]>:
>
> > [EMAIL PROTECTED] (Kai Henningsen) writes:
> >
> > > * Do we need a "native" wide char encoding, too (mostly for Win32 where
> > > it's UTF-16, but possibly also some Asian thing)?
> >
> > A single 'char' encoded in UTF-16?  This sounds horrible.
>
> I can't quite parse that.
wchar_t or WCHAR under win32 is UTF-16, it has the worst properties
of both UTF-8 and UCS-4. ie a character can be more than one WCHAR and
sizeof(WCHAR) is greater than one.

> For some locales, there is no locale-specific 8 bit code page (I think
> they substitute 1252 in those cases).
>
> As for the "Asian thing", I have a dim recollection of having heard of
> some Asian charsets that really are 16 bit, not ISO 2022-style multibyte.
> I don't claim to know if they are really used that way. That's why
> "possibly".
The Indian character sets are the normal example of a set that has no
'Ansi CodePage'. The Asian 'Ansi codepages' are a limited form of mbcs
with a maximum of two bytes per character and a fixed encoding (no long
range state like iso-2022). In addition cp1258 is seriously evil for a
windows ansi codepage as it includes Unicode style composing characters
(completely the reverse of all the other 'ansi codepages'!)

Win2000 also has support for UTF-8 as an 'Ansi codepage' tho no locales
seem to use it - just as well a program designed with windows DBCS in
mind will still not be able to handle UTF-8.

> Win32 has two standard encodings
The function renaming in win32 is just plain evil. The worst part is that
a program compiled with -D_UNICODE will not run on any of the '9x series.
This means you have to have two versions of the compiled program, one for
'9x and one for 'NT. The second major problem is that a normal 'Ansi'
program cannot be simply recompiled for Unicode because the interface of
nearly every function changes; you have to replace every char or char *
with TCHAR without breaking anything.

Making a binary that will run on all versions of windows in all locales
is almost impossible, even source code level is difficult.

IMO Windows is a very good example of how not to use unicode!

-- 
Rob.                          (Robert de Bath <robert$ @ debath.co.uk>)
                                       <http://www.cix.co.uk/~mayday>



-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: UTF16 and GCC

Reply via email to