On 12/05/2010 01:56, David Powell wrote:
Greetings,

I am having trouble sending unicode characters as utf8 over a socket handle.
Despite setting the encoding on the socket handle to utf8, it still seems to
use some other encoding when writing to the socket.  It works correctly when
writing to stdout, but not to a socket handle.  I am using ghc 6.12.1 and
network-2.2.1.7.  I can get it to work using System.IO.UTF8, but I was under
the impression this was no longer necessary?

I also don't seem to understand the interaction between hSetEncoding and
hSetBinaryMode because if I set the binary mode to 'False' and the
encoding to
utf8 on the socket, then when writing to the socket the string seems to be
truncated at the first non-ascii codepoint.

Here is a test snippet, which can be used with netcat as a listening server
(ie. nc -l 1234).

 > import System.IO
 > import Network
 > main = do
 >  let a="λ"
 >  s <- connectTo "127.0.0.1" (PortNumber 1234)
 >  hSetEncoding s utf8
 >  hSetEncoding stdout utf8
 >  hPutStrLn s a
 >  putStrLn a
 >  hClose s

You've found a bug, thanks. The bug is that a socket is bidirectional and we're only setting the encoding for one side (the read side) but we should be setting it for both sides.

I just created a ticket:

http://hackage.haskell.org/trac/ghc/ticket/4066

Expect a fix in GHC 6.12.3. In the meantime you can work around it, e.g. this worked for me to create a write-only socket that hSetEncoding works with:

connectTo hostname (PortNumber port) = do
    proto <- getProtocolNumber "tcp"
    bracketOnError
        (socket AF_INET Stream proto)
        (sClose)  -- only done if there's an error
        (\sock -> do
          he <- getHostByName hostname
          connect sock (SockAddrInet port (hostAddress he))
          socketToHandle sock WriteMode
        )

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to