The range from U+E000 to U+F8FF is Private Use, and, thus, in use.
There are also usable ranges from U+F900 up to U+FFFF, and beyond.
The only big invalid range in UTF-8 encoding, is for the codepoints
in the surrogates area: U+D800 to U+DFFF. These are used by UTF-16 to
encode codepoints outside the base plane.
See also http://www.ietf.org/rfc/rfc3629.txt
/vidar
Den 11. jun. 2007 kl. 11:49 skrev Uwe Schmidt:
I've got a bug report concerning the UTF decoding
in HXT. I've copied the source containing the bug from the
Haskell Internationalisation Working Group.
I guess this source is also used in other
projects, e.g. darcs.
My question: Is this really a bug or is it a feature.
My knowlege so far was, the intervall from
E000 to FFFF is not legal in unicode.
---------- Forwarded Message ----------
Subject: Bug in the HXT: Data.Char.UTF8.decodeOne
Date: Sunday 10 June 2007 07:40
From: PHO <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Hello,
I've found a bug in Data.Char.UTF8.decodeOne that it fails to decode
UTF-8 letters from U+E000 to U+FFFF. Here is the patch:
{
hunk ./src/Data/Char/UTF8.hs 248
- | b1 < 0xEE = decodeOne_threebyte bs
+ | b1 < 0xF0 = decodeOne_threebyte bs
}
--------------------------------------------------------------
here is the source
http://darcs.fh-wedel.de/hxt/src/Data/Char/UTF8.hs
Any suggestions?
Uwe Schmidt
_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell
_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell