[
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642655#comment-14642655
]
Rakesh R commented on ZOOKEEPER-832:
------------------------------------
[~abranzyck], Good work!
I've few comments on the proposed patch:
# Presently the connected server is validating the revalidation request and
close the session if the server is 'leader' or 'standalone'. I could see few
gaps like, say there are three servers A,B,C. C is Leader. Assume I've a client
which has connectionstring only to A(For example zkcli.sh shell). Again the
client will indefinitely retry, right. IMHO we could do few improvements to
handle this situation. Can we modify the algorithm like,
#* case-1) If connected server is 'standlone', then add error log and close
session
#* case-2) Else the connected server is 'leader', then add error log and close
session
#* case-3) Else the connected server is 'follower/observer', then forward the
request to the Leader to reopen the session. Invoke Leader.REVALIDATE with
lastZxidSeen {{getLearner().validateSession(cnxn, sessionId, sessionTimeout,
lastZxidSeen);}}. On the other side ,
[LearnerHandler.java#L553|https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L553]
can do the validation logic and close the session.
{code}
if (qp.getZxid() > leader.zk.getLastProcessedZxid()) {
leader.zk.closeSession(id);
valid = false;
// add error logs
} else{
{code}
# Please add test case to see the behavior in quorum.
# Please remove unused imports from the test class.
> Invalid session id causes infinite loop during automatic reconnect
> ------------------------------------------------------------------
>
> Key: ZOOKEEPER-832
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.5, 3.5.0
> Environment: All
> Reporter: Ryan Holmes
> Priority: Blocker
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch,
> ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch,
> ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch,
> ZOOKEEPER-832.patch
>
>
> Steps to reproduce:
> 1.) Connect to a standalone server using the Java client.
> 2.) Stop the server.
> 3.) Delete the contents of the data directory (i.e. the persisted session
> data).
> 4.) Start the server.
> The client now automatically tries to reconnect but the server refuses the
> connection because the session id is invalid. The client and server are now
> in an infinite loop of attempted and rejected connections. While this
> situation represents a catastrophic failure and the current behavior is not
> incorrect, it appears that there is no way to detect this situation on the
> client and therefore no way to recover.
> The suggested improvement is to send an event to the default watcher
> indicating that the current state is "session invalid", similar to how the
> "session expired" state is handled.
> Server log output (repeats indefinitely):
> 2010-08-05 11:48:08,283 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] -
> Accepted socket connection from /127.0.0.1:63292
> 2010-08-05 11:48:08,284 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing
> session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last
> zxid is 0x0 client must try another server
> 2010-08-05 11:48:08,284 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed
> socket connection for client /127.0.0.1:63292 (no session established for
> client)
> Client log output (repeats indefinitely):
> 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 -
> Opening socket connection to server localhost/127.0.0.1:2181
> 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session
> 0x12a3ae4e893000a for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring
> exception during shutdown input
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring
> exception during shutdown output
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)