[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109067#comment-13109067
 ] 

Camille Fournier commented on ZOOKEEPER-737:
--------------------------------------------

I've taken a lot of wrong steps on this one, but I think I have a solution.

Here's the problem as I see it: We can't cancel that selector key, because if 
we do that, the socket close will cut off the connection to the client before 
all the data has flushed through the network. I have validated that the data is 
all flushed to the byte buffer, but it never gets to the client if there is 
network slowness and it can't all successfully arrive before the CommandThread 
gets to the call to close. This can happen even in nc noninteractive mode (as 
we have observed with several false weekend alerts).

However, if we don't cancel the selector key, we'll potentially see a cancelled 
key exception or an EOF exception in our selector loops in non-interactive 
netcat, and that will preemptively close the socket.

So we need to keep the key from being cancelled by our processes to ensure the 
data gets completely sent, but we also need to ignore errors from selecting 
this key in the case of a 4lw. 

Right now, my solution is a total ugly hack that looks something like:

create a boolean in NIOServerCnxn called "ignoreClose", initally set to false. 
In checkFourLetterWord, set this boolean to true. In the Factory run loop, if 
we get a CancelledKeyException, with a NIOServerCnxn attachment, and the 
boolean set to true, ignore the exception. In doIO, if we get an EOFException 
(possibly any exception) and this boolean is set to true, ignore the exception. 
This lets us ignore the effects of a cancelled/closed incoming connection from 
nc without losing data on the socket for times when a 4lw needs a lot of data 
to be sent or a distance to send it.

Thoughts on this? It has been a bit of a nightmare to figure out, and my 
googling seems to indicate that netty won't support nc at all ( 
https://issues.jboss.org/browse/NETTY-236 ). 

> some 4 letter words may fail with netcat (nc)
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-737
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.0
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Blocker
>             Fix For: 3.3.1, 3.4.0
>
>         Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
> ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
> ZOOKEEPER-737.patch, ZOOKEEPER-737.patch
>
>
> nc closes the write channel as soon as it's sent it's information, for 
> example "echo stat|nc localhost 2181"
> in general this is fine, however the server code will close the socket as 
> soon as it receives notice that nc has
> closed it's write channel. if not all the 4 letter word result has been 
> written back to the client yet, this will cause
> some or all of the result to be lost - ie the client will not see the full 
> result. this was introduced in 3.3.0 as part
> of a change to reduce blocking of the selector by long running 4letter words.
> here's an example of the logs from the server during this
> echo -n stat | nc localhost 2181
> 2010-04-09 21:55:36,124 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - 
> Accepted socket connection from /127.0.0.1:42179
> 2010-04-09 21:55:36,124 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing 
> stat command from /127.0.0.1:42179
> 2010-04-09 21:55:36,125 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x0, likely client has closed socket
> 2010-04-09 21:55:36,125 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed 
> socket connection for client /127.0.0.1:42179 (no session established for 
> client)
> [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR 
> [Thread-15:NIOServerCnxn@422] - Unexpected Exception: 
> java.nio.channels.CancelledKeyException
>       at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>       at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
>       at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>       at java.io.PrintWriter.flush(PrintWriter.java:276)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)
> 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - 
> Thread Thread[Thread-15,5,main] died
> java.nio.channels.CancelledKeyException
>       at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>       at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
>       at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>       at java.io.PrintWriter.flush(PrintWriter.java:276)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to