Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

Markus Kuhn Tue, 08 Aug 2000 17:57:13 -0700
Florian Weimer wrote on 2000-08-08 17:45 UTC:
[Discussion on sizeof(wchar_t) == 4 in glibc]
> Ada's Wide_Character has 16 bits as well--it's mandated by the
> language, which means that transferring data between Ada and C code is
> more complicated when GNU libc is used.

Your friendly amateur Ada language lawyer begs to differ. Ada does not
dictate that Wide_Character'Size = 16. It only says that it "is a
character type whose values correspond to the 65536 code positions of
the ISO 10646 Basic Multilingual Plane (BMP)", but not what the memory
representation of that type is. Wide_Character is an enumeration type
with 2**16 values, i.e.

  type Wide_Character is (nul, soh ... FFFE, FFFF);

Ada gurus will remember that the value range and the memory size can be
handled rather independently in Ada, and I see no reason, why a compiler
author could not add a

  for Wide_Chararcter'Size use 32;

to his version of package Standard. You should of course get immediately
a CONSTRAINT_ERROR exception if a value > U+FFFF found its way from C
code into an Ada program.

The standard also explicitely allows for a non-standard mode where
Wide_Character can have other semantics. The authors were probably
primarily thinking about JIS X208 and friends, but UCS-4 is as natural
an option here as well.

I personally think that sizeof(wchar_t) == 4 is the right thing to do in
a C library and that Ada, Java, Win32, etc. will eventually have to find
ways around their restrictions to 16-bit. UTF-16 is not always the right
answer, because it is a multi-word encoding.

http://www.adahome.com/rm95/rm9x-03-05-02.html
http://www.adahome.com/rm95/rm9x-A-01.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

Reply via email to