The range from U+E000 to U+F8FF is Private Use, and, thus, in use. There are also usable ranges from U+F900 up to U+FFFF, and beyond.

The only big invalid range in UTF-8 encoding, is for the codepoints in the surrogates area: U+D800 to U+DFFF. These are used by UTF-16 to encode codepoints outside the base plane.

See also http://www.ietf.org/rfc/rfc3629.txt

/vidar

Den 11. jun. 2007 kl. 11:49 skrev Uwe Schmidt:


I've got a bug report concerning the UTF decoding
in HXT. I've copied the source containing the bug from the
Haskell Internationalisation Working Group.
I guess this source is also used in other
projects, e.g. darcs.

My question: Is this really a bug or is it a feature.
My knowlege so far was, the intervall from
E000 to FFFF is not legal in unicode.

----------  Forwarded Message  ----------

Subject: Bug in the HXT: Data.Char.UTF8.decodeOne
Date: Sunday 10 June 2007 07:40
From: PHO <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]

Hello,

I've found a bug in Data.Char.UTF8.decodeOne that it fails to decode
UTF-8 letters from U+E000 to U+FFFF. Here is the patch:

{
hunk ./src/Data/Char/UTF8.hs 248
-    | b1 < 0xEE   = decodeOne_threebyte bs
+    | b1 < 0xF0   = decodeOne_threebyte bs
}
--------------------------------------------------------------

here is the source
http://darcs.fh-wedel.de/hxt/src/Data/Char/UTF8.hs

Any suggestions?

Uwe Schmidt
_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell

_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to