Alex Mirgorodskiy created ZOOKEEPER-3591:
--------------------------------------------
Summary: Inconsistent resync with dynamic reconfig
Key: ZOOKEEPER-3591
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3591
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.5.5
Reporter: Alex Mirgorodskiy
Attachments: instance1.log.gz, instance6.log.gz
We've run into a problem where one of the zookeeper instances lost most of its
data after its zk process has been restarted. We suspect an interaction between
dynamic reconfiguration and snapshot-based resync of that instance. Details and
some amateurish analysis are below. We can also upload transaction logs, if
need be.
We have a 6-instance ensemble running version 3.5.5 with 3 quorum members and 3
observers. One of the observers (Instance 6) saw its db shrink from 3162 znodes
down to 10 after that instance restarted:
> 2019-10-13T16:44:19.060+0000 [.zk-monitor-0] Monitor command mntr: zk_version
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3162
> --
> 2019-10-13T16:48:32.713+0000 [.zk-monitor-0] Monitor command mntr: zk_version
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 10
Contrast it with Instance 1 that was the leader at the time, and whose
znode_count remained stable around 3000:
> 2019-10-13T16:44:48.625+0000 [.zk-monitor-0] Monitor command mntr: zk_version
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3178
> --
> ...
> --
> 2019-10-13T16:48:48.731+0000 [.zk-monitor-0] Monitor command mntr: zk_version
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3223
It appears that the problem had happened 30 minutes earlier, when Instance 6
got resynced from the leader via the Snap method, yet skipped creating an
on-disk snapshot. The end result was that the in-memory state was fine, but
there was only the primordial snapshot.0 on disk, and transaction logs only
started after the missing snapshot:
$ ls -l version-2
> total 1766
> -rw-r--r-- 1 daautomation daautomation 1 Oct 13 09:14 acceptedEpoch
> -rw-r--r-- 1 daautomation daautomation 1 Oct 13 10:12 currentEpoch
> -rw-r--r-- 1 daautomation daautomation 2097168 Oct 13 09:44 log.6000002e0
> -rw-r--r-- 1 daautomation daautomation 1048592 Oct 13 10:09 log.600001f1b
> -rw-r--r-- 1 daautomation daautomation 4194320 Oct 13 12:16 log.600003310
> -rw-r--r-- 1 daautomation daautomation 770 Oct 13 09:14 snapshot.0
So the zk reboot wiped out most of the state.
Dynamic reconfig might be relevant here. Instance 6 started as an observer, got
removed, and immediately re-added as a participant. Instance 2 went the other
way, from participant to observer:
> 2019-10-13T16:14:19.323+0000 ZK reconfig: removing node 6
> 2019-10-13T16:14:19.359+0000 ZK reconfig: adding
> server.6=10.80.209.138:2888:3888:participant;0.0.0.0:2181
> 2019-10-13T16:14:19.399+0000 ZK reconfig: adding
> server.2=10.80.209.131:2888:3888:observer;0.0.0.0:2181
Looking at the logs, Instance 6 started and received a resync snapshot from the
leader right before the dynamic reconfig:
> 2019-10-13T16:14:19.284+0000
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Getting a
> snapshot from leader 0x6000002dd
> ...
> 2019-10-13T16:14:19.401+0000
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Got zxid
> 0x6000002de expected 0x1
Had it processed the NEWLEADER packet afterwards, it would've persisted the
snapshot locally. But there's no NEWLEADER message in the Instance 6 log.
Instead, there's a "changes proposed in reconfig" exception, likely a result of
the instance getting dynamically removed and re-added as a participant:
> 2019-10-13T16:14:19.467+0000
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Becoming a
> non-voting participant
> 2019-10-13T16:14:19.467+0000
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Exception when
> observing the leaderjava.lang.Exception: changes proposed in reconfig\n\tat
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:506)\n\tat
>
> org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:74)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1258)
Perhaps the NEWLEADER packet was still in the socket, but sitting behing
INFORMANDACTIVATE, whose exception prevented us from processing NEWLEADER?
Also, it may or may not be related, but this area got changed recently as part
of https://issues.apache.org/jira/browse/ZOOKEEPER-3104.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)