[
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787949#comment-13787949
]
Germán Blanco commented on ZOOKEEPER-832:
-----------------------------------------
Hello Ben,
thank you for your help. After reading your comment, I am thinking now in a
different line. Please help me understand how this works.
- We have a client that has seen a higher zxid than the follower to which it is
trying to connect.
- This might be because a really sick situation in which the transactions for
that client were lost or just a timing problem.
- If this is a really sick situation, then we want to detect it. Otherwise just
sync and continue.
- First question: Could the sync be solved by issuing a sync from the follower
to the leader?
- Second question: Is there a reliable way to detect the very sick situation
always? Because there might be one possibility if we use the check in
ZOOKEEPER-1777.
- Then we have to take the decision of what to do if the really sick situation
is detected. Session information should be reset. The client should be informed
somehow. And the general contract of how ZooKeeper works has been broken, so
depending on how ZooKeeper is used in some systems it may even make sense to
halt the system and report a fatal failure, and others will be happy with a
warning and keep working (which would be my case).
Not easy to solve at all. Did I get this right?
> Invalid session id causes infinite loop during automatic reconnect
> ------------------------------------------------------------------
>
> Key: ZOOKEEPER-832
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
> Project: ZooKeeper
> Issue Type: Improvement
> Components: c client, java client
> Affects Versions: 3.3.1
> Environment: Mac OS X 10.6.4
> JVM 1.6.0_20
> Reporter: Ryan Holmes
> Assignee: Germán Blanco
> Fix For: 3.5.0
>
>
> Steps to reproduce:
> 1.) Connect to a standalone server using the Java client.
> 2.) Stop the server.
> 3.) Delete the contents of the data directory (i.e. the persisted session
> data).
> 4.) Start the server.
> The client now automatically tries to reconnect but the server refuses the
> connection because the session id is invalid. The client and server are now
> in an infinite loop of attempted and rejected connections. While this
> situation represents a catastrophic failure and the current behavior is not
> incorrect, it appears that there is no way to detect this situation on the
> client and therefore no way to recover.
> The suggested improvement is to send an event to the default watcher
> indicating that the current state is "session invalid", similar to how the
> "session expired" state is handled.
> Server log output (repeats indefinitely):
> 2010-08-05 11:48:08,283 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] -
> Accepted socket connection from /127.0.0.1:63292
> 2010-08-05 11:48:08,284 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing
> session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last
> zxid is 0x0 client must try another server
> 2010-08-05 11:48:08,284 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed
> socket connection for client /127.0.0.1:63292 (no session established for
> client)
> Client log output (repeats indefinitely):
> 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 -
> Opening socket connection to server localhost/127.0.0.1:2181
> 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session
> 0x12a3ae4e893000a for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring
> exception during shutdown input
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring
> exception during shutdown output
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
--
This message was sent by Atlassian JIRA
(v6.1#6144)