[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832373#comment-13832373
 ] 

Benjamin Reed commented on ZOOKEEPER-832:
-----------------------------------------

i think the solution to this is to use the dbid. it would fix a couple of 
things.

the dbid is supposed to be a globally unique id that tags the zookeeper data. 
we have it in the file headers, but we never really finished integrating it. 
one key purpose was to make it so the clients could always know that they are 
talking to a server that is using the same zookeeper data instance as all the 
other servers it has been talking too. it would also ensure that the zookeeper 
servers are using the same db instance. in addition i think it could be 
extended to your purposes.

for issue 1) the nice thing about using a ramdisk is that everything gets lost 
on reboot, so if you don't have anything you know it. now when a server tries 
to connect to the ensemble it can signal that it will not participate in quorum 
establishment because its dbid is 0. once it connects to an established leader 
it can sync with the leader and take on its dbid.

for issue 2) we would need to add the dbid to ConnectRequest (something that 
we've needed to do anyway). if the client connects to a server with a different 
dbid, the server can reliably tell the client that the zookeeper instance that 
is was connected to is gone and it should close its handle. this would work 
even if the server has a later zxid than the last seen by the client.

> Invalid session id causes infinite loop during automatic reconnect
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-832
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.1, 3.4.5, 3.5.0
>         Environment: All
>            Reporter: Ryan Holmes
>            Assignee: Germán Blanco
>             Fix For: 3.4.6, 3.5.0
>
>         Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
> ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch
>
>
> Steps to reproduce:
> 1.) Connect to a standalone server using the Java client.
> 2.) Stop the server.
> 3.) Delete the contents of the data directory (i.e. the persisted session 
> data).
> 4.) Start the server.
> The client now automatically tries to reconnect but the server refuses the 
> connection because the session id is invalid. The client and server are now 
> in an infinite loop of attempted and rejected connections. While this 
> situation represents a catastrophic failure and the current behavior is not 
> incorrect, it appears that there is no way to detect this situation on the 
> client and therefore no way to recover.
> The suggested improvement is to send an event to the default watcher 
> indicating that the current state is "session invalid", similar to how the 
> "session expired" state is handled.
> Server log output (repeats indefinitely):
> 2010-08-05 11:48:08,283 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
> Accepted socket connection from /127.0.0.1:63292
> 2010-08-05 11:48:08,284 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
> session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
> zxid is 0x0 client must try another server
> 2010-08-05 11:48:08,284 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
> socket connection for client /127.0.0.1:63292 (no session established for 
> client)
> Client log output (repeats indefinitely):
> 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
> Opening socket connection to server localhost/127.0.0.1:2181
> 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
> 0x12a3ae4e893000a for server null, unexpected error, closing socket 
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
> exception during shutdown input
> java.nio.channels.ClosedChannelException
>       at 
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>       at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>       at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
> exception during shutdown output
> java.nio.channels.ClosedChannelException
>       at 
> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>       at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>       at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to