[jira] [Commented] (ZOOKEEPER-2938) Server is unable to join quorum after connection broken to other peers

Coran Stow (Jira) Thu, 10 Aug 2023 23:27:04 -0700


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753063#comment-17753063
 ]


Coran Stow commented on ZOOKEEPER-2938:
---------------------------------------

I hit this issue on a client site where quorum formation was hit and miss, so I 
had a look through the code.

The Peers try to ensure that they only have one connection between any pair of 
servers in the peer group to manage the quorum formation, and they do this by 
dropping the connection when the server that initiates the connection finds 
that the server it has connected to has a higher zkid. If the server that 
initiates the connection finds that it has a higher zkid then it maintains the 
connection. That's where we get:
Have smaller server identifier, so dropping the connection:
This is expected behaviour. The problem is actually that the server with the 
higher zkid fails to sustain a connection to the server that is emitting this 
error.

As part of resolving the peer list for the quorum, each node scans the initial 
configuration and if it notices that the ID of the server it reads from the 
config is its own zkid, it stores its own hostname(s). It seems this was done 
as part of https://issues.apache.org/jira/browse/ZOOKEEPER-107 to allow a 
server to discover its own hostname(s) and communicate them to the other nodes 
to allow for dynamic reconfiguration.

So, when the quorum manager initiates a connection to a peer, in sends its 
hostnames to that peer. 
([here|https://github.com/apache/zookeeper/blob/15f29b51a22bc51b9d6074cb7f3e72bb00a9753a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L466])

When a quorum manager successfully _receives_ a connection from a peer and it 
has a higher zkid than the server it has received a connection from it 
_immediately_ initiates a connection back to the sender, using the zkid and 
hostname(s) sent in the sender's initial connection. If this all happens as the 
receiving server is starting up, it will interfere with that server's attempts 
to connect to other servers using its initial configuration because the quorum 
manager won't attempt to connect to another server if it's already attempting 
to connect to a server with that same zkid. 
([here|https://github.com/apache/zookeeper/blob/15f29b51a22bc51b9d6074cb7f3e72bb00a9753a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L597])

In an environment where the hostname that a Zookeeper server discovers for 
itself is different from the hostname other servers use to discover it, eg when 
your ensemble spans multiple Kubernetes clusters, the hostname that a node 
discovers for itself might not be suitable for other nodes to try and resolve. 
If the node discovers its hostname is 
{{{}zookeeper1.zookeeper.zookeeper-us-east-dc.svc.cluster.local{}}}, it can 
pass that on to other nodes in the ensemble who may then fail to connect 
because they're in a different Kubernetes cluster. 

Setting the hostname to "{{{}0.0.0.0{}}}" works because Zookeeper can bind to 
that IP address, but there is no hostname to pass on to other servers. Or more 
technically, it passes its hostname as {{null}} to other servers which then 
handle the {{null}} address by using the address they already have in their 
configuration. 

 

> Server is unable to join quorum after connection broken to other peers
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2938
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2938
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.6, 3.4.14
>            Reporter: Abhay Bothra
>            Priority: Major
>
> We see the following logs in the node with {{myid: 1}}
> {code}
> 2017-11-08 15:06:28,375 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:06:28,375 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:07:28,375 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:07:28,375 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:07:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:08:28,375 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:08:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:08:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:09:28,376 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:09:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:09:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:10:28,376 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:10:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:10:28,377 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> {code}
> On the nodes with {{myid: 2}} and {{myid: 3}}, we see connection broken 
> events for {{myid: 1}}
> {code}
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1, 
> my id = 2, error =
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:209)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at java.net.SocketInputStream.read(SocketInputStream.java:223)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>         at 
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
>         at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
> {code}
> From the reported occurrences, it looks like this is a problem only when the 
> node with the smallest {{myid}} loses connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ZOOKEEPER-2938) Server is unable to join quorum after connection broken to other peers

Reply via email to