Re: UTF-8 encode/decode libraries.

Sven Panne Mon, 26 Apr 2004 11:33:49 -0700

Duncan Coutts wrote:

On Mon, 2004-04-26 at 18:49, David Brown wrote: [...]
toUTF :: String -> String

Hmmm, "String -> [Word8]" would be nicer...

fromUTF :: String -> String


... and here: "[Word8] -> String" or "[Word8] -> Maybe String".
Furthermore, UTF-8 is not restricted to a maximum of 3 bytes per character,
here an excerpt from "man utf8" on my SuSE Linux:

       * UTF-8  encoded  UCS  characters  may  be up to six bytes
         long, however the Unicode standard specifies no  characters
         above  0x10ffff, so Unicode characters can only be up to
         four bytes long in UTF-8.

IIRC we discussed encoders/decoders quite some time ago on the libraries
mailing list, but nothing really happened, which is a pity. We should
strive for something more general than UTF-8 <-> UCS/Unicode, there are
quite a few more widely used encodings, e.g. GSM 03.38, etc. Any takers?

Cheers,
   S.

_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: UTF-8 encode/decode libraries.

Reply via email to