Florian Weimer wrote on 2000-08-08 17:45 UTC:
[Discussion on sizeof(wchar_t) == 4 in glibc]
> Ada's Wide_Character has 16 bits as well--it's mandated by the
> language, which means that transferring data between Ada and C code is
> more complicated when GNU libc is used.
Your friendly amateur Ada language lawyer begs to differ. Ada does not
dictate that Wide_Character'Size = 16. It only says that it "is a
character type whose values correspond to the 65536 code positions of
the ISO 10646 Basic Multilingual Plane (BMP)", but not what the memory
representation of that type is. Wide_Character is an enumeration type
with 2**16 values, i.e.
type Wide_Character is (nul, soh ... FFFE, FFFF);
Ada gurus will remember that the value range and the memory size can be
handled rather independently in Ada, and I see no reason, why a compiler
author could not add a
for Wide_Chararcter'Size use 32;
to his version of package Standard. You should of course get immediately
a CONSTRAINT_ERROR exception if a value > U+FFFF found its way from C
code into an Ada program.
The standard also explicitely allows for a non-standard mode where
Wide_Character can have other semantics. The authors were probably
primarily thinking about JIS X208 and friends, but UCS-4 is as natural
an option here as well.
I personally think that sizeof(wchar_t) == 4 is the right thing to do in
a C library and that Ada, Java, Win32, etc. will eventually have to find
ways around their restrictions to 16-bit. UTF-16 is not always the right
answer, because it is a multi-word encoding.
http://www.adahome.com/rm95/rm9x-03-05-02.html
http://www.adahome.com/rm95/rm9x-A-01.html
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/