[
https://issues.apache.org/jira/browse/ZOOKEEPER-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330421#comment-17330421
]
Klibi commented on ZOOKEEPER-2938:
----------------------------------
[~edwiny] Could you please explain more what you mean "when the zookeeper
ensemble is deployed across the same nodes as the broker cluster".
Do you mean that this happens when kafka and zookeeper nodes are on the same VM
?
> Server is unable to join quorum after connection broken to other peers
> ----------------------------------------------------------------------
>
> Key: ZOOKEEPER-2938
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2938
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.6, 3.4.14
> Reporter: Abhay Bothra
> Priority: Major
>
> We see the following logs in the node with {{myid: 1}}
> {code}
> 2017-11-08 15:06:28,375 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (2, 1)
> 2017-11-08 15:06:28,375 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (3, 1)
> 2017-11-08 15:07:28,375 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:07:28,375 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (2, 1)
> 2017-11-08 15:07:28,376 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (3, 1)
> 2017-11-08 15:08:28,375 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:08:28,376 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (2, 1)
> 2017-11-08 15:08:28,376 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (3, 1)
> 2017-11-08 15:09:28,376 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:09:28,376 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (2, 1)
> 2017-11-08 15:09:28,376 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (3, 1)
> 2017-11-08 15:10:28,376 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:10:28,376 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (2, 1)
> 2017-11-08 15:10:28,377 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier,
> so dropping the connection: (3, 1)
> {code}
> On the nodes with {{myid: 2}} and {{myid: 3}}, we see connection broken
> events for {{myid: 1}}
> {code}
> 2017-11-07 02:54:32,135 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1,
> my id = 2, error =
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:209)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.net.SocketInputStream.read(SocketInputStream.java:223)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
> 2017-11-07 02:54:32,135 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2017-11-07 02:54:32,135 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
> 2017-11-07 02:54:32,135 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
> {code}
> From the reported occurrences, it looks like this is a problem only when the
> node with the smallest {{myid}} loses connection.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)