Hamilton Richards <[EMAIL PROTECTED]> writes:

> At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote:
> >I have just been reading through the Haskell report to refresh my
> >memory of the language. I was surprised to see this:
> >
> >The character type Char is an enumeration and consists of 16 bit values,
> >conforming to
> >the Unicode standard [10].
> >
> >Unicode uses 24-bit values to identify characters.
> 
> According to the official Unicode web site [0],
> 
>       The Unicode Standard defines three encoding forms
>       that allow the same data to be transmitted in a byte,
>       word or double word oriented format (i.e. in 8, 16 or
>       32-bits per code unit).
> 
> [0] http://www.unicode.org/unicode/standard/principles.html

You have to distinguish between encodings (you refer to
utf-8, utf-16 and utf-32) and the unicode (iso-10646) tables
of codepoints themselves.

16 bits is enough to describe the Basic Multilingual Plane
and I think 24 bits all the currently defined extended
planes.  So I guess the report just refers to the BMP.

Jens



_______________________________________________
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to