GlenGeng commented on a change in pull request #2184:
URL: https://github.com/apache/ozone/pull/2184#discussion_r621131003
##########
File path:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -294,6 +294,14 @@ public void pause() {
getLifeCycle().transition(LifeCycle.State.PAUSED);
}
+ @Override
+ public void reinitialize() {
+ if (getLifeCycleState() == LifeCycle.State.PAUSED) {
Review comment:
After `notifyInstallSnapshotFromLeader` is called, ratis calls
```
if (reply != null) {
LOG.info("{}: StateMachine successfully installed snapshot
index {}. Reloading the StateMachine.",
getMemberId(), reply.getIndex());
stateMachine.pause();
state.updateInstalledSnapshotIndex(reply);
state.reloadStateMachine(reply.getIndex());
}
```
`stateMachine.pause();` will make SM to be in PAUSED state,
`state.reloadStateMachine(reply.getIndex())` will trigger
`StateMachineUpdater#reload()` to be called, which will then call
`stateMachine.reinitialize();`.
This is the reason of the fix.
As far as I known, at the end of
`TestSCMInstallSnapshotWithHA#testInstallSnapshot`, the followerSCM is also in
PAUSED state, which is not checked before.
And for `testInstallOldCheckpointFailure`, `notifyInstallSnapshotFromLeader`
is not really called, since without the fix in RATIS-1369, downloading snapshot
taken at index 0 is ignore by follower SCM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]