[
https://issues.apache.org/jira/browse/RATIS-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinyu Tan updated RATIS-1862:
-----------------------------
Description:
Recently during our daily testing, we found that when we stopped RaftServer, a
snapshot could be triggered, taking close to 40s, even if the state in the
statemachine had not changed. This is not in line with our expectations. If we
want to take a snapshot for some regions, we will do so actively through the
triggerSnapshot interface. We don't actually want the RaftServer itself to take
snapshots when it stops
!image-2023-07-28-12-59-28-050.png!
After exploring the code, we found that the snapshot was triggered by the
StateMachineUpdater, which basically triggered a snapshot whenever the
applyIndex and commitIndex were equal when the cluster was stopped.
!image-2023-07-28-11-18-28-876.png!
!image-2023-07-28-11-18-52-826.png!
!image-2023-07-28-13-06-00-209.png!
After exploring the code, we found that the snapshot was triggered by the
StateMachineUpdater, which basically triggered a snapshot whenever the
applyIndex and commitIndex were equal when the cluster was stopped.
We want to tweak the logic here. Add a triggerSnapshotWhenStopEnabled
parameter, the default value is true. We'll put that in the shouldTakeSnapshot
function. This is fully compatible with other existing services that take
snapshots when the cluster is stopped. But in the case of IoTDB, we can set
this parameter to false to avoid launching a snapshot that does not meet our
expectations.
What's your opinion? [~szetszwo]
Do you have any comments?
was:
Recently during our daily testing, we found that when we stopped RaftServer, a
snapshot could be triggered, taking close to 40s, even if the state in the
statemachine had not changed. This is not in line with our expectations. If we
want to take a snapshot for some regions, we will do so actively through the
triggerSnapshot interface. We don't actually want the RaftServer itself to take
snapshots when it stops
!image-2023-07-28-12-59-28-050.png!
After exploring the code, we found that the snapshot was triggered by the
StateMachineUpdater, which basically triggered a snapshot whenever the
applyIndex and commitIndex were equal when the cluster was stopped.
!image-2023-07-28-11-18-28-876.png!
!image-2023-07-28-11-18-52-826.png!
!image-2023-07-28-13-06-00-209.png!
After exploring the code, we found that the snapshot was triggered by the
StateMachineUpdater, which basically triggered a snapshot whenever the
applyIndex and commitIndex were equal when the cluster was stopped.
We want to tweak the logic here. Add a enableTriggerSnapshotWhenStop parameter,
the default value is true. We'll put that in the shouldTakeSnapshot function.
This is fully compatible with other existing services that take snapshots when
the cluster is stopped. But in the case of IoTDB, we can set this parameter to
false to avoid launching a snapshot that does not meet our expectations.
What's your opinion? [~szetszwo]
Do you have any comments?
> Add the parameter whether to take Snapshot when stopping to adapt to
> different services
> ---------------------------------------------------------------------------------------
>
> Key: RATIS-1862
> URL: https://issues.apache.org/jira/browse/RATIS-1862
> Project: Ratis
> Issue Type: New Feature
> Reporter: Xinyu Tan
> Assignee: Xinyu Tan
> Priority: Major
> Attachments: image-2023-07-28-11-18-28-876.png,
> image-2023-07-28-11-18-52-826.png, image-2023-07-28-12-59-28-050.png,
> image-2023-07-28-13-06-00-209.png
>
>
> Recently during our daily testing, we found that when we stopped RaftServer,
> a snapshot could be triggered, taking close to 40s, even if the state in the
> statemachine had not changed. This is not in line with our expectations. If
> we want to take a snapshot for some regions, we will do so actively through
> the triggerSnapshot interface. We don't actually want the RaftServer itself
> to take snapshots when it stops
> !image-2023-07-28-12-59-28-050.png!
> After exploring the code, we found that the snapshot was triggered by the
> StateMachineUpdater, which basically triggered a snapshot whenever the
> applyIndex and commitIndex were equal when the cluster was stopped.
> !image-2023-07-28-11-18-28-876.png!
> !image-2023-07-28-11-18-52-826.png!
> !image-2023-07-28-13-06-00-209.png!
> After exploring the code, we found that the snapshot was triggered by the
> StateMachineUpdater, which basically triggered a snapshot whenever the
> applyIndex and commitIndex were equal when the cluster was stopped.
> We want to tweak the logic here. Add a triggerSnapshotWhenStopEnabled
> parameter, the default value is true. We'll put that in the
> shouldTakeSnapshot function. This is fully compatible with other existing
> services that take snapshots when the cluster is stopped. But in the case of
> IoTDB, we can set this parameter to false to avoid launching a snapshot that
> does not meet our expectations.
>
>
> What's your opinion? [~szetszwo]
>
> Do you have any comments?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)