On Thu, 2010-12-23 at 14:15 +0800, Magicloud Magiclouds wrote: > On Thu, Dec 23, 2010 at 2:01 PM, Mark Lentczner <ma...@glyphic.com> wrote: > > > > On Dec 22, 2010, at 9:29 PM, Magicloud Magiclouds wrote: > >> Thus under all situation (ascii, UTF-8, or even > >> UTF-32), my program always send 4 bytes through the network. Is that > >> OK? > > > > Generally, no. > > > > Haskell strings are sequences of Unicode characters. Each character has an > > integral code point value, from 0 to 0x10ffff, but technically, the code > > point itself is just a number, not a pattern of bits to be exchanged. That > > is an encoding. > > > > In any protocol you need know the encoding before you exchange characters > > as bytes or words. In some protocols it is implicit, in others explicit in > > header or meta data, and in yet others (IRC comes to mind) it is undefined > > (which makes problems for the user). > > > > The UTF-8 encoding uses a variable number of bytes to represent each > > character, depending on the code point, not Word32 as you suggested. > > > > Converting from Haskell's String to various encodings can be done with > > either the "text" package or "utf8-string" package. > > > > - Mark > > I see. I just realize that, in this case (ssh), I could use CString to > avoid all problems about encoding. >
By using CString you may avoid problems by putting them on users. CString is char * and Foreign marshaling just use ASCII. And as non only English speaking user of computer programs I ask to have support of unicode (for example utf-8). Unless you mean only commands, not data, in which you probably should check details of protocol. In any case I don't think that CString is correct approach to network data and you probably should use ByteString in place of CString. Regards
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe