Hi William, Thanks a lot for reporting the bug. Could you file a JIRA?
Tsz-Wo On Mon, Apr 11, 2022 at 4:24 PM 宋子阳 <[email protected]> wrote: > Hi folks, > > I’ve discovered a bug in installSnapshot RPC handler, causing the follower > to reply success where it actually failed. > > org.apache.ratis.server.storage.SnapshotManager.java > > public void installSnapshot(StateMachine stateMachine, > InstallSnapshotRequestProto request) throws IOException { > ... > if (snapshotChunkRequest.getDone()) { > LOG.info("Install snapshot is done, renaming tnp dir:{} to:{}", > tmpDir, dir.getStateMachineDir()); > dir.getStateMachineDir().delete(); // Here delete() may fail > tmpDir.renameTo(dir.getStateMachineDir()); > } > } > > > After the follower receives the entire snapshot data, it will first store > the file in a tmp dir, then renames to StateMachineDir. However, when the > StateMachineDir is not empty, delete() will fail, and renamTo() will fail > too. Under this scenario, the latest snapshot file will remain in tmp dir > and the statemachine cannot fetch the this snapshot. > > The StateMachineDir can be non-empty since the old installed snapshots are > stored in StateMachineDir and may not be cleaned up due to retention > policy, next time when leader want to install snapshot again this > circumstance will appear. > > Thanks! > > William Song > Apache IoTDB
