[ 
https://issues.apache.org/jira/browse/RATIS-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved RATIS-1862.
-------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

The pull request is now merged.  Thanks, [~tanxinyu]!

> Add the parameter whether to take Snapshot when stopping to adapt to 
> different services
> ---------------------------------------------------------------------------------------
>
>                 Key: RATIS-1862
>                 URL: https://issues.apache.org/jira/browse/RATIS-1862
>             Project: Ratis
>          Issue Type: New Feature
>          Components: server
>            Reporter: Xinyu Tan
>            Assignee: Xinyu Tan
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: image-2023-07-28-11-18-28-876.png, 
> image-2023-07-28-11-18-52-826.png, image-2023-07-28-12-59-28-050.png, 
> image-2023-07-28-13-06-00-209.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently during our daily testing, we found that when we stopped RaftServer, 
> a snapshot could be triggered, taking close to 40s, even if the state in the 
> statemachine had not changed. This is not in line with our expectations. If 
> we want to take a snapshot for some regions, we will do so actively through 
> the triggerSnapshot interface. We don't actually want the RaftServer itself 
> to take snapshots when it stops
> !image-2023-07-28-12-59-28-050.png!
> After exploring the code, we found that the snapshot was triggered by the 
> StateMachineUpdater, which basically triggered a snapshot whenever the 
> applyIndex and commitIndex were equal when the cluster was stopped.
> !image-2023-07-28-11-18-28-876.png!
> !image-2023-07-28-11-18-52-826.png!
> !image-2023-07-28-13-06-00-209.png!
> After exploring the code, we found that the snapshot was triggered by the 
> StateMachineUpdater, which basically triggered a snapshot whenever the 
> applyIndex and commitIndex were equal when the cluster was stopped.
> We want to tweak the logic here. Add a triggerSnapshotWhenStopEnabled 
> parameter, the default value is true. We'll put that in the 
> shouldTakeSnapshot function. This is fully compatible with other existing 
> services that take snapshots when the cluster is stopped. But in the case of 
> IoTDB, we can set this parameter to false to avoid launching a snapshot that 
> does not meet our expectations.
> What's your opinion? [~szetszwo] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to