[
https://issues.apache.org/jira/browse/ZOOKEEPER-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291319#comment-17291319
]
Alex Mirgorodskiy commented on ZOOKEEPER-4220:
----------------------------------------------
Yep, this is what we are seeing. We forcibly shut down the current leader, and
one of the remaining instances occasionally runs into two back-to-back timeouts
(2 x 1.5s) trying to connect to the downed leader:
{quote}2020-12-13T22:46:30.997+0000 [.WorkerReceiver[myid=6]] Notification: 2
(message format version), 1 (n.leader), 0x700003804 (n.zxid), 0x4 (n.round),
LOOKING (n.state), 1 (n.sid), 0x7 (n.peerEPoch), FOLLOWING (my state)7000000ad
(n.config version)
2020-12-13T22:46:32.503+0000 [.WorkerSender[myid=6]] Cannot open channel to 4
at election address /10.80.140.226:3888java.net.SocketTimeoutException: connect
timed out\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)\n\tat
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat
java.net.Socket.connect(Socket.java:589)\n\tat
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648)\n\tat
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705)\n\tat
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)\n\tat
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:478)\n\tat
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:457)\n\tat
java.lang.Thread.run(Thread.java:745)
2020-12-13T22:46:34.006+0000 [.WorkerSender[myid=6]] Cannot open channel to 4
at election address /10.80.140.226:3888java.net.SocketTimeoutException: connect
timed out\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)\n\tat
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat
java.net.Socket.connect(Socket.java:589)\n\tat
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648)\n\tat
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:712)\n\tat
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)\n\tat
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:478)\n\tat
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:457)\n\tat
java.lang.Thread.run(Thread.java:745)
{quote}
Sometimes, this pair of connects repeats, perhaps when the remaining live
instances disagree on the election round. This is when the election doesn't
seem to converge.
And yes, we are using dynamic reconfig (but not at the time of the crash, I
believe).
Thank you for making the change!
> Redundant connection attempts during leader election if quorum members changed
> ------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4220
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.9, 3.6.2
> Reporter: Alex Mirgorodskiy
> Assignee: Mate Szalay-Beko
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.10, 3.6.3, 3.7.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We've seen a few failures or long delays in electing a new leader when the
> previous one has a hard host reset (as opposed to just the service process
> down, since connections don't need to wait for timeout there). Symptoms are
> similar to https://issues.apache.org/jira/browse/ZOOKEEPER-2164. Reducing
> cnxTimeout from 5 to 1.5 seconds makes the problem much less frequent, but
> doesn't fix it completely. We are still using an old ZooKeeper version
> (3.5.5), and the new async connect feature will presumably avoid it.
> But we noticed a pattern of twice the expected number of connection attempts
> to the same downed instance in the log, and it appears to be due to a code
> glitch in QuorumCnxManager.java:
>
> {code:java}
> synchronized void connectOne(long sid) {
> ...
> if (lastCommittedView.containsKey(sid)) {
> knownId = true;
> if (connectOne(sid, lastCommittedView.get(sid).electionAddr))
> return;
> }
> if (lastSeenQV != null && lastProposedView.containsKey(sid)
> && (!knownId || (lastProposedView.get(sid).electionAddr != <----
> lastCommittedView.get(sid).electionAddr))) {
> knownId = true;
> if (connectOne(sid, lastProposedView.get(sid).electionAddr))
> return;
> }
> {code}
> Comparing electionAddrs should be done with !equals presumably, otherwise
> connectOne will be invoked an extra time even in the common case when the
> addresses do match.
> The code around it has changed recently, but the check itself still exists at
> the top of master. It might not matter as much with the async connects, but
> perhaps it helps even then.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)