Wenhan Feng created ZOOKEEPER-4903: -------------------------------------- Summary: An Inconsistency of `lastProcessed` variable in Spec and Impl Key: ZOOKEEPER-4903 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4903 Project: ZooKeeper Issue Type: Bug Components: other Affects Versions: 3.10.0 Reporter: Wenhan Feng
I have noticed that in the specification of ZKV3_7_0, it appears that the `SyncFollower` action uses an incorrect `lastProcessed`. In `SyncFollower`, the following code is used to determine which sync mode should be employed: ``` lastProcessedZxid == lastProcessed[i].zxid minCommittedIdx == lastSnapshot[i].index + 1 maxCommittedIdx == IF zabState[i] = BROADCAST THEN lastCommitted[i].index ELSE Len(history[i]) committedLogEmpty == minCommittedIdx > maxCommittedIdx minCommittedLog == IF committedLogEmpty THEN lastProcessedZxid ELSE history[i][minCommittedIdx].zxid maxCommittedLog == IF committedLogEmpty THEN lastProcessedZxid ELSE IF maxCommittedIdx = 0 THEN << 0, 0>> ELSE history[i][maxCommittedIdx].zxid ``` We can observe that `lastProcessedZxid` is assigned the value of `lastProcessed[i].zxid`, where `i` is the leader node ID and `lastProcessed` is a global variable in this specification. According to the comment in the specification, the `lastProcessed` variable means "The index and zxid of the last processed transaction in history". And we can see that it will be updated in the `StartZkServer(i)` Action with the following code: ``` LastProposed(i) == IF Len(history[i]) = 0 THEN [ index |-> 0, zxid |-> <<0, 0>> ] ELSE LET lastIndex == Len(history[i]) entry == history[i][lastIndex] IN [ index |-> lastIndex, zxid |-> entry.zxid ] StartZkServer(i) == LET latest == LastProposed(i) IN /\ lastCommitted' = [lastCommitted EXCEPT ![i] = latest] /\ lastProcessed' = [lastProcessed EXCEPT ![i] = latest] /\ lastSnapshot' = [lastSnapshot EXCEPT ![i] = latest] /\ UpdateElectionVote(i, acceptedEpoch[i]) ``` We know that when a cluster finishes an election but before receiving any user requests, according to the specification, it will contain no history. And in `StartZKServer(i)`, `lastSnapshot[i]` will be assigned `[ index |-> 0, zxid |-> <<0, 0>> ]`. However, in the implementation code, I found that the `lastProcessed` used in the `LearnHanler.syncFollower()` function is actually the `dataTree.lastProcessedZxid` variable. And this variable is updated with the following code when `ZkServer` is starting: ``` Leader.java : Line 657: zk.setZxid(ZxidUtils.makeZxid(epoch, 0)); Leader.java : Line 1609: zk.getZKDatabase().setlastProcessedZxid(zk.getZxid()); ``` Therefore, when a cluster finishes an election but before receiving any user requests, its value is indeed `<<1, 0>>`. This causes the following `LeaderSyncFollower` action to choose an incorrect sync model as per the specification. I found that the specification already seems to use a variable, namely `tempMaxEpoch`, to record `zk.hzxid` (which `zk.getZxid()` actually retrieves). It seems that we should use `tempMaxEpoch[i]` in the `StartZkServer` action. And I think the correct code should be: ``` LastProposed(i) == IF Len(history[i]) = 0 THEN [ index |-> 0, zxid |-> <<Maximum({tempMaxEpoch[i], 0}), 0>> ] ...... StartZkServer(i) == LET latest == LastProposed(i) IN ...... /\ lastProcessed' = [lastProcessed EXCEPT ![i] = latest] ...... ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)