[ 
https://issues.apache.org/jira/browse/RATIS-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Tan updated RATIS-1862:
-----------------------------
    Description: 
Recently during our daily testing, we found that when we stopped RaftServer, a 
snapshot could be triggered, taking close to 40s, even if the state in the 
statemachine had not changed. This is not in line with our expectations. If we 
want to take a snapshot for some regions, we will do so actively through the 
triggerSnapshot interface. We don't actually want the RaftServer itself to take 
snapshots when it stops

!image-2023-07-28-12-59-28-050.png!

After exploring the code, we found that the snapshot was triggered by the 
StateMachineUpdater, which basically triggered a snapshot whenever the 
applyIndex and commitIndex were equal when the cluster was stopped.

!image-2023-07-28-11-18-28-876.png!
!image-2023-07-28-11-18-52-826.png!

!image-2023-07-28-13-06-00-209.png!

After exploring the code, we found that the snapshot was triggered by the 
StateMachineUpdater, which basically triggered a snapshot whenever the 
applyIndex and commitIndex were equal when the cluster was stopped.

We want to tweak the logic here. Add a triggerSnapshotWhenStopEnabled 
parameter, the default value is true. We'll put that in the shouldTakeSnapshot 
function. This is fully compatible with other existing services that take 
snapshots when the cluster is stopped. But in the case of IoTDB, we can set 
this parameter to false to avoid launching a snapshot that does not meet our 
expectations.

What's your opinion? [~szetszwo] 

  was:
Recently during our daily testing, we found that when we stopped RaftServer, a 
snapshot could be triggered, taking close to 40s, even if the state in the 
statemachine had not changed. This is not in line with our expectations. If we 
want to take a snapshot for some regions, we will do so actively through the 
triggerSnapshot interface. We don't actually want the RaftServer itself to take 
snapshots when it stops

!image-2023-07-28-12-59-28-050.png!

After exploring the code, we found that the snapshot was triggered by the 
StateMachineUpdater, which basically triggered a snapshot whenever the 
applyIndex and commitIndex were equal when the cluster was stopped.

!image-2023-07-28-11-18-28-876.png!
!image-2023-07-28-11-18-52-826.png!

!image-2023-07-28-13-06-00-209.png!

After exploring the code, we found that the snapshot was triggered by the 
StateMachineUpdater, which basically triggered a snapshot whenever the 
applyIndex and commitIndex were equal when the cluster was stopped.

We want to tweak the logic here. Add a triggerSnapshotWhenStopEnabled 
parameter, the default value is true. We'll put that in the shouldTakeSnapshot 
function. This is fully compatible with other existing services that take 
snapshots when the cluster is stopped. But in the case of IoTDB, we can set 
this parameter to false to avoid launching a snapshot that does not meet our 
expectations.

 

 

What's your opinion? [~szetszwo] 

 

Do you have any comments?


> Add the parameter whether to take Snapshot when stopping to adapt to 
> different services
> ---------------------------------------------------------------------------------------
>
>                 Key: RATIS-1862
>                 URL: https://issues.apache.org/jira/browse/RATIS-1862
>             Project: Ratis
>          Issue Type: New Feature
>            Reporter: Xinyu Tan
>            Assignee: Xinyu Tan
>            Priority: Major
>         Attachments: image-2023-07-28-11-18-28-876.png, 
> image-2023-07-28-11-18-52-826.png, image-2023-07-28-12-59-28-050.png, 
> image-2023-07-28-13-06-00-209.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recently during our daily testing, we found that when we stopped RaftServer, 
> a snapshot could be triggered, taking close to 40s, even if the state in the 
> statemachine had not changed. This is not in line with our expectations. If 
> we want to take a snapshot for some regions, we will do so actively through 
> the triggerSnapshot interface. We don't actually want the RaftServer itself 
> to take snapshots when it stops
> !image-2023-07-28-12-59-28-050.png!
> After exploring the code, we found that the snapshot was triggered by the 
> StateMachineUpdater, which basically triggered a snapshot whenever the 
> applyIndex and commitIndex were equal when the cluster was stopped.
> !image-2023-07-28-11-18-28-876.png!
> !image-2023-07-28-11-18-52-826.png!
> !image-2023-07-28-13-06-00-209.png!
> After exploring the code, we found that the snapshot was triggered by the 
> StateMachineUpdater, which basically triggered a snapshot whenever the 
> applyIndex and commitIndex were equal when the cluster was stopped.
> We want to tweak the logic here. Add a triggerSnapshotWhenStopEnabled 
> parameter, the default value is true. We'll put that in the 
> shouldTakeSnapshot function. This is fully compatible with other existing 
> services that take snapshots when the cluster is stopped. But in the case of 
> IoTDB, we can set this parameter to false to avoid launching a snapshot that 
> does not meet our expectations.
> What's your opinion? [~szetszwo] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to