RE: Proposal for 2 Byte Unicode implementation in gcc and glibc

Karlsson Kent - keka Wed, 09 Aug 2000 04:56:18 -0700


> -----Original Message-----
> From: Markus Kuhn [mailto:[EMAIL PROTECTED]]

...
> Your friendly amateur Ada language lawyer begs to differ. Ada does not
> dictate that Wide_Character'Size = 16. It only says that it "is a
> character type whose values correspond to the 65536 code positions of
> the ISO 10646 Basic Multilingual Plane (BMP)", but not what the memory
> representation of that type is. Wide_Character is an enumeration type
> with 2**16 values, i.e.
> 
>   type Wide_Character is (nul, soh ... FFFE, FFFF);
> 
> Ada gurus will remember that the value range and the memory size can be
> handled rather independently in Ada, and I see no reason, why a compiler
> author could not add a
> 
>   for Wide_Chararcter'Size use 32;
> 
> to his version of package Standard. You should of course get immediately
> a CONSTRAINT_ERROR exception if a value > U+FFFF found its way from C
> code into an Ada program.

Does that also mean that Character'Size can be any size *greater than*
or equal to 8 too?

The comment (in http://www.adahome.com/rm95/rm9x-A-01.html), that
"The first 256 positions have the same contents as type Character."
seems to imply that Character and Wide_character have the same storage 
widths.  So *both* Character (with a value limitation to 255) and 
Wide_Character *could* really be, at the storage level, a UCS-2 character,
a UTF-16 code unit, or UTF-32 character...


> I personally think that sizeof(wchar_t) == 4 is the right thing to do in
> a C library and that Ada, Java, Win32, etc. will eventually have to find
> ways around their restrictions to 16-bit. UTF-16 is not always the right
> answer, because it is a multi-word encoding.

Well, I think that for Java and Win32 the answer will be/is that
UTF-16 is used (internally) for the string datatypes; and that Java's 
"char" is (will be) a UTF-16 code unit (not necessarily a character).


                Kind regards
                /kent k
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
RE: Proposal for 2 Byte Unicode implementation in gcc and glibc

Reply via email to