Duncan Coutts wrote:
On Mon, 2004-04-26 at 18:49, David Brown wrote: [...]
toUTF :: String -> String

Hmmm, "String -> [Word8]" would be nicer...


fromUTF :: String -> String

... and here: "[Word8] -> String" or "[Word8] -> Maybe String". Furthermore, UTF-8 is not restricted to a maximum of 3 bytes per character, here an excerpt from "man utf8" on my SuSE Linux:

       * UTF-8  encoded  UCS  characters  may  be up to six bytes
         long, however the Unicode standard specifies no  characters­
         above  0x10ffff, so Unicode characters can only be up to
         four bytes long in UTF-8.

IIRC we discussed encoders/decoders quite some time ago on the libraries
mailing list, but nothing really happened, which is a pity. We should
strive for something more general than UTF-8 <-> UCS/Unicode, there are
quite a few more widely used encodings, e.g. GSM 03.38, etc. Any takers?

Cheers,
   S.

_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to