anmolnar edited a comment on issue #1048: ZOOKEEPER-3188: Improve resilience to 
network
URL: https://github.com/apache/zookeeper/pull/1048#issuecomment-540011376
 
 
   I uploaded the logs of the failing Follower here: 
https://pastebin.com/LsXYiRKt
   
   It was running on a Mac and the situation was as previously described:
   1. 2 interfaces was running: wifi and cable,
   2. cable plugged out,
   3. wifi got disabled, cable plugged in
   
   After the 3rd step we had to wait approximately 1 minute for the quorum to 
get up again. We believe that it was because at the first exception:
   ```
   2019-10-09 13:49:43,744 [myid:1] - WARN  
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@127]
 - Exception when following the leader
   java.net.SocketTimeoutException: Read timed out
   ```
   Follower shuts down, restarting the leader election, but `QuorumCnxnManager` 
still believes the connections are still up. After a minute it finally gets 
SocketException here:
   ```
   2019-10-09 13:50:37,709 [myid:1] - WARN  
[RecvWorker:3:QuorumCnxManager$RecvWorker@1336] - Connection broken for id 3, 
my id = 1, error =
   java.net.SocketException: Operation timed out (Read failed)
   ```
   and shuts down all Senc/Recv workers. This is because the read timeout on 
that socket is infinite to prevent the leader election port shutdown when no 
traffic is transmitted. At this point the leader election raised the 
notification timeout to approx. 1 minute, so we have to wait for notifications 
to be resent quite long.
   
   If only a single node is failing, the quorum is still up, so I believe it's 
not a big deal. But if we think about an entire switch failure which could 
shutdown the entire ensemble at the same time, this could be too long to 
recover.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to