Xinyu Tan created RATIS-1862:
--------------------------------
Summary: Add the parameter whether to take Snapshot when stopping
to adapt to different services
Key: RATIS-1862
URL: https://issues.apache.org/jira/browse/RATIS-1862
Project: Ratis
Issue Type: New Feature
Reporter: Xinyu Tan
Assignee: Xinyu Tan
Attachments: image-2023-07-28-11-18-28-876.png,
image-2023-07-28-11-18-52-826.png, image-2023-07-28-12-59-28-050.png,
image-2023-07-28-13-06-00-209.png
Recently during our daily testing, we found that when we stopped RaftServer, a
snapshot could be triggered, taking close to 40s, even if the state in the
statemachine had not changed. This is not in line with our expectations. If we
want to take a snapshot for some regions, we will do so actively through the
triggerSnapshot interface. We don't actually want the RaftServer itself to take
snapshots when it stops
!image-2023-07-28-12-59-28-050.png!
After exploring the code, we found that the snapshot was triggered by the
StateMachineUpdater, which basically triggered a snapshot whenever the
applyIndex and commitIndex were equal when the cluster was stopped.
!image-2023-07-28-11-18-28-876.png!
!image-2023-07-28-11-18-52-826.png!
!image-2023-07-28-13-06-00-209.png!
After exploring the code, we found that the snapshot was triggered by the
StateMachineUpdater, which basically triggered a snapshot whenever the
applyIndex and commitIndex were equal when the cluster was stopped.
We want to tweak the logic here. Add a enableTriggerSnapshotWhenStop parameter,
the default value is true. We'll put that in the shouldStop function. This is
fully compatible with other existing services that take snapshots when the
cluster is stopped. But in the case of IoTDB, we can set this parameter to
false to avoid launching a snapshot that does not meet our expectations.
What's your opinion? [~szetszwo]
Do you have any comments?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)