Re: [Haskell-cafe] UTF-8 in Haskell.

Mark Lentczner Wed, 22 Dec 2010 22:03:29 -0800

On Dec 22, 2010, at 9:29 PM, Magicloud Magiclouds wrote:
> Thus under all situation (ascii, UTF-8, or even
> UTF-32), my program always send 4 bytes through the network. Is that
> OK?


Generally, no.

Haskell strings are sequences of Unicode characters. Each character has an 
integral code point value, from 0 to 0x10ffff, but technically, the code point 
itself is just a number, not a pattern of bits to be exchanged. That is an 
encoding.

In any protocol you need know the encoding before you exchange characters as 
bytes or words. In some protocols it is implicit, in others explicit in header 
or meta data, and in yet others (IRC comes to mind) it is undefined (which 
makes problems for the user).

The UTF-8 encoding uses a variable number of bytes to represent each character, 
depending on the code point, not Word32 as you suggested.

Converting from Haskell's String to various encodings can be done with either 
the "text" package or "utf8-string" package.

                - Mark
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] UTF-8 in Haskell.

Reply via email to