Hamilton Richards <[EMAIL PROTECTED]> writes: > At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote: > >I have just been reading through the Haskell report to refresh my > >memory of the language. I was surprised to see this: > > > >The character type Char is an enumeration and consists of 16 bit values, > >conforming to > >the Unicode standard [10]. > > > >Unicode uses 24-bit values to identify characters. > > According to the official Unicode web site [0], > > The Unicode Standard defines three encoding forms > that allow the same data to be transmitted in a byte, > word or double word oriented format (i.e. in 8, 16 or > 32-bits per code unit). > > [0] http://www.unicode.org/unicode/standard/principles.html
You have to distinguish between encodings (you refer to utf-8, utf-16 and utf-32) and the unicode (iso-10646) tables of codepoints themselves. 16 bits is enough to describe the Basic Multilingual Plane and I think 24 bits all the currently defined extended planes. So I guess the report just refers to the BMP. Jens _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell