[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102882#comment-13102882
 ] 

Camille Fournier commented on ZOOKEEPER-737:
--------------------------------------------

Yeah, I don't know anything about the difference between nc or telnet, or what 
zkdashboard is using, but this is with telnet interactive (reproducing a 
problem we see in zkdashboard). It's reproducible but tricky. If I run 
stat/dump from a remote server into a leader with a lot of 
connections/ephemerals, it reliably fails. It prints out some of the data and 
closes the connection suddenly. For example, the end of a dump command:
14 expire at Mon Sep 12 14:14:04 EDT 2011:
        0x1325208b8c90089
        0x2325208b8c30099
        0x4325208b8ca0113
        0x1325208b8c900cd
        0x2325208b8c3009d
        0x5325211469900ae
        0x2325208b8c30091
        0x4325208b8ca00b7
        0x1325208b8c90094
        0x1325208b8c90090
        0x5325211469900d7
        0x5325211469900a4
        0x2325208b8c3009e
        0x2325208b8c3009c
0 expire at Mon Sep 12 14:14:10 EDT 2011:Connection closed by foreign host.

I tried a bit of debugging against the running server. Breakpoints anywhere 
inside the dump thread before the exit of closeSock() will cause the problem 
not to occur. But a breakpoint at the exit of closeSock() will still show the 
problem. 

This is "a lot" of ephemerals/sessions in that we're talking about ~120 
sessions and 88 ephemerals. Hardly thousands.

> some 4 letter words may fail with netcat (nc)
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-737
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.0
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Blocker
>             Fix For: 3.3.1, 3.4.0
>
>         Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
> ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
> ZOOKEEPER-737.patch, ZOOKEEPER-737.patch
>
>
> nc closes the write channel as soon as it's sent it's information, for 
> example "echo stat|nc localhost 2181"
> in general this is fine, however the server code will close the socket as 
> soon as it receives notice that nc has
> closed it's write channel. if not all the 4 letter word result has been 
> written back to the client yet, this will cause
> some or all of the result to be lost - ie the client will not see the full 
> result. this was introduced in 3.3.0 as part
> of a change to reduce blocking of the selector by long running 4letter words.
> here's an example of the logs from the server during this
> echo -n stat | nc localhost 2181
> 2010-04-09 21:55:36,124 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - 
> Accepted socket connection from /127.0.0.1:42179
> 2010-04-09 21:55:36,124 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing 
> stat command from /127.0.0.1:42179
> 2010-04-09 21:55:36,125 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x0, likely client has closed socket
> 2010-04-09 21:55:36,125 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed 
> socket connection for client /127.0.0.1:42179 (no session established for 
> client)
> [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR 
> [Thread-15:NIOServerCnxn@422] - Unexpected Exception: 
> java.nio.channels.CancelledKeyException
>       at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>       at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
>       at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>       at java.io.PrintWriter.flush(PrintWriter.java:276)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)
> 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - 
> Thread Thread[Thread-15,5,main] died
> java.nio.channels.CancelledKeyException
>       at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>       at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
>       at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>       at java.io.PrintWriter.flush(PrintWriter.java:276)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to