Alistair Bayley writes: > On 05/02/07, Chris Kuklewicz <[EMAIL PROTECTED]> wrote:
> > UTF-8 is a 4 byte encoding. There is no valid UTF-8 5 or 6 byte > > encoding. > > Chris is right here, in that Takusen's decoder is incorrect w.r.t. the > standard, in allowing up to 6 bytes to encode a single char. <snip> > There's nothing stopping the Unicode consortium from expanding the > range of codepoints, is there? Or have they said that'll never happen? I believe they have. In particular, UTF-16 only supports code points up to 10FFFF. From <http://en.wikipedia.org/wiki/Universal_Character_Set>: > the UCS stops at 10FFFF and ISO/IEC 10646 has stated that all future > assignments of characters will also take place in that range [...] > ISO 10646 was limited to contain as many characters as could be > encoded by UTF-16 and no more, that is, a little over a million > characters instead of over 2,000 million -- David Menendez <[EMAIL PROTECTED]> | "In this house, we obey the laws <http://www.eyrie.org/~zednenem> | of thermodynamics!" _______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell