Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

David Menendez Mon, 05 Feb 2007 20:05:45 -0800

Alistair Bayley writes:

> On 05/02/07, Chris Kuklewicz <[EMAIL PROTECTED]> wrote:


> > UTF-8 is a 4 byte encoding.  There is no valid UTF-8 5 or 6 byte
> > encoding.
> 
> Chris is right here, in that Takusen's decoder is incorrect w.r.t. the
> standard, in allowing up to 6 bytes to encode a single char.

<snip> 

> There's nothing stopping the Unicode consortium from expanding the
> range of codepoints, is there? Or have they said that'll never happen?

I believe they have. In particular, UTF-16 only supports code points up
to 10FFFF.

From <http://en.wikipedia.org/wiki/Universal_Character_Set>:

> the UCS stops at 10FFFF and ISO/IEC 10646 has stated that all future
> assignments of characters will also take place in that range
[...]
> ISO 10646 was limited to contain as many characters as could be
> encoded by UTF-16 and no more, that is, a little over a million
> characters instead of over 2,000 million
-- 
David Menendez <[EMAIL PROTECTED]> | "In this house, we obey the laws
<http://www.eyrie.org/~zednenem>      |        of thermodynamics!"
_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

Reply via email to