[
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068032#comment-17068032
]
Lasaro Camargos commented on ZOOKEEPER-3769:
--------------------------------------------
I went back and looked into some older logs and could confirm that the
WorkerReceiver died and that's what caused the election to hang. However, the
BufferUnderflowException was present in very few instances. Most of the time,
it was a NegativeArraySizeException that was caught, but pretty much in the
same situation, that is, after the connection being broken to node3. The
following are excerpts from node1 and node 3. Let me know if you would like to
have a look at the full logs.
03/23/20 10:14:45,772
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.ZooKeeperServer] (ZooKeeperServer.java:166) -
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout
40000 datadir /company/service/log/version-2 snapdir
/company/service/data/version-2
03/23/20 10:14:45,772
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.Learner] (Follower.java:69) - FOLLOWING -
LEADER ELECTION TOOK - 9 MS
03/23/20 10:14:45,774
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:202) -
Resolved address for companydemo3.snc4.companyinc.com:
companydemo3.snc4.companyinc.com/172.22.64.148
03/23/20 10:14:45,793
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UNKNOWN17
500000000 null
03/23/20 10:14:45,798
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i DIFF
40000001f null
03/23/20 10:14:45,799
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:391) - Getting a
diff from the leader 0x40000001f
03/23/20 10:14:45,801
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i NEWLEADER
500000000 null
03/23/20 10:14:45,801
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:546) - Learner
received NEWLEADER message
03/23/20 10:14:45,815
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UPTODATE
ffffffffffffffff null
03/23/20 10:14:45,816
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:529) - Learner
received UPTODATE message
03/23/20 10:14:45,816
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:1916) -
Reconfig feature is disabled, skip reconfig processing.
03/23/20 10:14:45,817
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.CommitProcessor] (CommitProcessor.java:256)
- Configuring CommitProcessor with 32 worker threads.
03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] INFO
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:924) - Received connection request 172.22.30.98:58472
03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000]
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1038) - Address of remote peer: 3
03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000]
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1055) - Calling finish for 3
03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000]
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1072) - Removing entry from senderWorkerMap sid=3
03/23/20 10:14:46,065 [SendWorker:3] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1143) - Interrupted while waiting for message on queue
java.lang.InterruptedException: null
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
~[?:?]
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
~[?:?]
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
~[?:?]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
~[zookeeper-3.5.7.jar:3.5.7]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
~[zookeeper-3.5.7.jar:3.5.7]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
~[zookeeper-3.5.7.jar:3.5.7]
03/23/20 10:14:46,065 [SendWorker:3] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1114) - Attempting to send lastMessage to sid=3
03/23/20 10:14:46,066 [SendWorker:3] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1055) - Calling finish for 3
03/23/20 10:14:46,066 [WorkerReceiver[myid=1]] INFO
[org.apache.zookeeper.server.quorum.FastLeaderElection]
(FastLeaderElection.java:267) - Backward compatibility mode (28 bits), server
id: 3
03/23/20 10:14:46,065 [RecvWorker:3] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1227) - Connection broken for id 3, my id = 1, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) ~[?:?]
at java.io.BufferedInputStream.read(BufferedInputStream.java:271) ~[?:?]
at java.io.DataInputStream.readInt(DataInputStream.java:392) ~[?:?]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
~[zookeeper-3.5.7.jar:3.5.7]
03/23/20 10:14:46,066 [SendWorker:3] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1153) - Send worker leaving thread id 3 my id = 1
03/23/20 10:14:46,066 [WorkerReceiver[myid=1]] INFO
[org.apache.zookeeper.server.quorum.FastLeaderElection]
(FastLeaderElection.java:309) - Backward compatibility mode (before reconfig),
server id: 3
03/23/20 10:14:46,066 [RecvWorker:3] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1230) - Interrupting SendWorker
03/23/20 10:14:46,066 [RecvWorker:3] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1055) - Calling finish for 3
03/23/20 10:14:46,066 [WorkerReceiver[myid=1]] DEBUG
[org.apache.zookeeper.server.quorum.FastLeaderElection]
(FastLeaderElection.java:332) - Receive new notification message. My id = 1
03/23/20 10:14:46,067 [WorkerReceiver[myid=1]] INFO
[org.apache.zookeeper.server.quorum.FastLeaderElection]
(FastLeaderElection.java:679) - Notification: 0 (message format version), 0
(n.leader), 0x583c5800d6340d (n.zxid), 0xb71421b100000000 (n.round), FOLLOWING
(n.state), 3 (n.sid), 0x583c58 (n.peerEPoch), FOLLOWING (my state)
03/23/20 10:14:46,127 [WorkerReceiver[myid=1]] ERROR
[org.apache.zookeeper.server.NIOServerCnxnFactory]
(NIOServerCnxnFactory.java:92) - Thread Thread[WorkerReceiver[myid=1],5,main]
died
java.lang.NegativeArraySizeException: -1082130432
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:276)
~[zookeeper-3.5.7.jar:3.5.7]
at java.lang.Thread.run(Thread.java:835) ~[?:?]
03/23/20 10:14:48,729 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] INFO
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:924) - Received connection request 172.22.24.91:44072
03/23/20 10:14:48,729 [companydemo1.snc4.companyinc.com/172.22.65.65:4000]
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1038) - Address of remote peer: 1
03/23/20 10:14:48,730 [RecvWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1227) - Connection broken for id 1, my id = 1, error =
java.io.EOFException: null
at java.io.DataInputStream.readInt(DataInputStream.java:397) ~[?:?]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
~[zookeeper-3.5.7.jar:3.5.7]
03/23/20 10:14:48,731 [RecvWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1230) - Interrupting SendWorker
03/23/20 10:14:48,731 [RecvWorker:1] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1055) - Calling finish for 1
03/23/20 10:14:48,732 [RecvWorker:1] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1072) - Removing entry from senderWorkerMap sid=1
03/23/20 10:14:48,732 [SendWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1143) - Interrupted while waiting for message on queue
java.lang.InterruptedException: null
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
~[?:?]
Node 3
03/23/20 10:14:45,796 [LearnerHandler-/172.22.65.65:60166] DEBUG
[org.apache.zookeeper.server.quorum.LearnerHandler] (LearnerHandler.java:480) -
Sending NEWLEADER message to 1
03/23/20 10:14:45,797 [Sender-/172.22.65.65:60166] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - o DIFF
40000001f null
03/23/20 10:14:45,797 [Sender-/172.22.64.47:44648] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - o DIFF
40000001f null
03/23/20 10:14:45,798 [Sender-/172.22.65.65:60166] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - o
NEWLEADER 500000000 null
03/23/20 10:14:45,798 [Sender-/172.22.64.47:44648] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - o
NEWLEADER 500000000 null
03/23/20 10:14:45,804 [LearnerHandler-/172.22.65.65:60166] DEBUG
[org.apache.zookeeper.server.quorum.LearnerHandler] (LearnerHandler.java:513) -
Received NEWLEADER-ACK message from 1
03/23/20 10:14:45,804 [LearnerHandler-/172.22.64.47:44648] DEBUG
[org.apache.zookeeper.server.quorum.LearnerHandler] (LearnerHandler.java:513) -
Received NEWLEADER-ACK message from 2
03/23/20 10:14:45,804
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.Leader] (Leader.java:1296) - Have quorum of
supporters, sids: [ [1, 3],[1, 3] ]; starting up and setting last processed
zxid: 0x500000000
03/23/20 10:14:45,805
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:1916) -
Reconfig feature is disabled, skip reconfig processing.
03/23/20 10:14:45,806
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x104c41214550000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x104c41214550000 40000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x204c3f1cea10000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x204c3f1cea10000 40000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x304c3eda5410000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x304c3eda5410000 40000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x204c3ef8a220000
03/23/20 10:14:45,807
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x204c3ef8a220000 40000
03/23/20 10:14:45,808
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x304c3eb60c30000
03/23/20 10:14:45,808
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x304c3eb60c30000 40000
03/23/20 10:14:45,808
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x104c40fcfd70000
03/23/20 10:14:45,808
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x104c40fcfd70000 40000
03/23/20 10:14:45,810
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.quorum.CommitProcessor] (CommitProcessor.java:256)
- Configuring CommitProcessor with 32 worker threads.
03/23/20 10:14:45,812
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.PrepRequestProcessor]
(PrepRequestProcessor.java:99) - zookeeper.skipACL=="yes", ACL checks will be
skipped
03/23/20 10:14:45,814
[QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO
[org.apache.zookeeper.server.ContainerManager] (ContainerManager.java:64) -
Using checkIntervalMs=60000 maxPerMinute=10000
03/23/20 10:14:45,815 [LearnerHandler-/172.22.65.65:60166] DEBUG
[org.apache.zookeeper.server.quorum.LearnerHandler] (LearnerHandler.java:534) -
Sending UPTODATE message to 1
03/23/20 10:14:45,815 [LearnerHandler-/172.22.64.47:44648] DEBUG
[org.apache.zookeeper.server.quorum.LearnerHandler] (LearnerHandler.java:534) -
Sending UPTODATE message to 2
03/23/20 10:14:45,815 [Sender-/172.22.65.65:60166] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - o
UPTODATE ffffffffffffffff null
03/23/20 10:14:45,815 [Sender-/172.22.64.47:44648] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - o
UPTODATE ffffffffffffffff null
03/23/20 10:14:45,816 [LearnerHandler-/172.22.64.47:44648] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - i ACK
500000000 null
03/23/20 10:14:45,816 [LearnerHandler-/172.22.64.47:44648] TRACE
[org.apache.zookeeper.server.quorum.Leader] (Leader.java:849) - Ack zxid:
0x500000000
03/23/20 10:14:45,816 [LearnerHandler-/172.22.65.65:60166] TRACE
[org.apache.zookeeper.server.quorum.LearnerHandler] (ZooTrace.java:71) - i ACK
500000000 null
03/23/20 10:14:45,816 [LearnerHandler-/172.22.64.47:44648] TRACE
[org.apache.zookeeper.server.quorum.Leader] (Leader.java:855) - outstanding
proposals all
03/23/20 10:14:45,816 [LearnerHandler-/172.22.65.65:60166] TRACE
[org.apache.zookeeper.server.quorum.Leader] (Leader.java:849) - Ack zxid:
0x500000000
03/23/20 10:14:45,817 [LearnerHandler-/172.22.65.65:60166] TRACE
[org.apache.zookeeper.server.quorum.Leader] (Leader.java:855) - outstanding
proposals all
03/23/20 10:14:46,065 [RecvWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1227) - Connection broken for id 1, my id = 3, error =
java.io.EOFException: null
at java.io.DataInputStream.readInt(DataInputStream.java:397) ~[?:?]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
~[zookeeper-3.5.7.jar:3.5.7]
03/23/20 10:14:46,065 [RecvWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1230) - Interrupting SendWorker
03/23/20 10:14:46,065 [RecvWorker:1] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1055) - Calling finish for 1
03/23/20 10:14:46,066 [RecvWorker:1] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1072) - Removing entry from senderWorkerMap sid=1
03/23/20 10:14:46,066 [SendWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1143) - Interrupted while waiting for message on queue
java.lang.InterruptedException: null
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
~[?:?]
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
~[?:?]
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
~[zookeeper-3.5.7.jar:3.5.7]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
~[zookeeper-3.5.7.jar:3.5.7]
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
~[zookeeper-3.5.7.jar:3.5.7]
03/23/20 10:14:46,066 [SendWorker:1] DEBUG
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1055) - Calling finish for 1
03/23/20 10:14:46,066 [SendWorker:1] WARN
[org.apache.zookeeper.server.quorum.QuorumCnxManager]
(QuorumCnxManager.java:1153) - Send worker leaving thread id 1 my id = 3
03/23/20 10:14:50,290 [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:2181]
DEBUG [org.apache.zookeeper.server.NIOServerCnxnFactory]
(NIOServerCnxnFactory.java:296) - Accepted socket connection from
/172.22.64.148:41132
03/23/20 10:14:50,295 [NIOWorkerThread-1] DEBUG
[org.apache.zookeeper.server.ZooKeeperServer] (ZooKeeperServer.java:1001) -
Session establishment request from client /172.22.64.148:41132 client's
lastZxid is 0x0
03/23/20 10:14:50,295 [NIOWorkerThread-1] DEBUG
[org.apache.zookeeper.server.SessionTrackerImpl] (SessionTrackerImpl.java:274)
- Adding session 0x304c3f1b81c0000
03/23/20 10:14:50,295 [NIOWorkerThread-1] TRACE
[org.apache.zookeeper.server.SessionTrackerImpl] (ZooTrace.java:71) -
SessionTrackerImpl --- Adding session 0x304c3f1b81c0000 40000
03/23/20 10:14:50,296 [NIOWorkerThread-1] DEBUG
[org.apache.zookeeper.server.ZooKeeperServer] (ZooKeeperServer.java:1052) -
Client attempting to establish new session: session = 0x304c3f1b81c0000, zxid =
0x0, timeout = 60000, address = /172.22.64.148:41132
03/23/20 10:14:50,296 [ProcessThread(sid:3 cport:-1):] TRACE
[org.apache.zookeeper.server.PrepRequestProcessor] (ZooTrace.java:88) -
:Psessionid:0x304c3f1b81c0000 type:createSession cxid:0x0
zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> fast leader election does not end if leader is taken down
> ---------------------------------------------------------
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.5.7
> Reporter: Lasaro Camargos
> Assignee: Mate Szalay-Beko
> Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and
> this config
>
> tickTime=2000
> initLimit=30
> syncLimit=3
> dataDir=/company/service/data
> dataLogDir=/company/service/log
> clientPort=2181
> snapCount=100000
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> skipACL=yes
> preAllocSize=65536
> maxClientCnxns=0
> 4lw.commands.whitelist=*
> admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
> server.2=companydemo2.snc4.companyinc.com:3000:4000
> server.3=companydemo3.snc4.companyinc.com:3000:4000
>
> Could you have a look at the logs and help me figure this out? It seems like
> node 1 is not getting notifications back from node2, but I don't see anything
> wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could
> be causing it.
>
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)