Hi William,

Thanks a lot for reporting the bug.  Could you file a JIRA?

Tsz-Wo


On Mon, Apr 11, 2022 at 4:24 PM 宋子阳 <[email protected]> wrote:

> Hi folks,
>
> I’ve discovered a bug in installSnapshot RPC handler, causing the follower
> to reply success where it actually failed.
>
> org.apache.ratis.server.storage.SnapshotManager.java
>
> public void installSnapshot(StateMachine stateMachine,
> InstallSnapshotRequestProto request) throws IOException {
> ...
> if (snapshotChunkRequest.getDone()) {
>     LOG.info("Install snapshot is done, renaming tnp dir:{} to:{}",
>         tmpDir, dir.getStateMachineDir());
>     dir.getStateMachineDir().delete(); // Here delete() may fail
>     tmpDir.renameTo(dir.getStateMachineDir());
>     }
> }
>
>
> After the follower receives the entire snapshot data, it will first store
> the file in a tmp dir, then renames to StateMachineDir. However, when the
> StateMachineDir is not empty, delete() will fail, and renamTo() will fail
> too. Under this scenario, the latest snapshot file will remain in tmp dir
> and the statemachine cannot fetch the this snapshot.
>
> The StateMachineDir can be non-empty since the old installed snapshots are
> stored in StateMachineDir and may not be cleaned up due to retention
> policy, next time when leader want to install snapshot again this
> circumstance will appear.
>
> Thanks!
>
> William Song
> Apache IoTDB

Reply via email to