Hi folks,

I’ve discovered a bug in installSnapshot RPC handler, causing the follower to 
reply success where it actually failed.

org.apache.ratis.server.storage.SnapshotManager.java

public void installSnapshot(StateMachine stateMachine,
InstallSnapshotRequestProto request) throws IOException {
...
if (snapshotChunkRequest.getDone()) {
    LOG.info("Install snapshot is done, renaming tnp dir:{} to:{}",
        tmpDir, dir.getStateMachineDir());
    dir.getStateMachineDir().delete(); // Here delete() may fail
    tmpDir.renameTo(dir.getStateMachineDir());
    }
}


After the follower receives the entire snapshot data, it will first store the 
file in a tmp dir, then renames to StateMachineDir. However, when the 
StateMachineDir is not empty, delete() will fail, and renamTo() will fail too. 
Under this scenario, the latest snapshot file will remain in tmp dir and the 
statemachine cannot fetch the this snapshot.

The StateMachineDir can be non-empty since the old installed snapshots are 
stored in StateMachineDir and may not be cleaned up due to retention policy, 
next time when leader want to install snapshot again this circumstance will 
appear.

Thanks!

William Song
Apache IoTDB 

Reply via email to