This is an automated email from the ASF dual-hosted git repository.

szetszwo pushed a commit to branch branch-2_tmp
in repository https://gitbox.apache.org/repos/asf/ratis.git

commit fb176cfea734fc5cc489c08adb0aa8be61894f3f
Author: William Song <[email protected]>
AuthorDate: Tue Nov 29 08:06:26 2022 +0800

    RATIS-1750. Add snapshot section in dev guide (#788)
    
    (cherry picked from commit 96b2f267e81959d345725a5dc840725b32fd1b2b)
---
 ratis-docs/src/site/markdown/snapshot.md | 240 +++++++++++++++++++++++++++++++
 1 file changed, 240 insertions(+)

diff --git a/ratis-docs/src/site/markdown/snapshot.md 
b/ratis-docs/src/site/markdown/snapshot.md
new file mode 100644
index 000000000..f20dc19d7
--- /dev/null
+++ b/ratis-docs/src/site/markdown/snapshot.md
@@ -0,0 +1,240 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# Snapshot Guide
+
+## Overview
+
+Raft log grows during normal operation. As it grows larger, it occupies more 
space and takes more time to replay. 
+Therefore, some form of log compaction is necessary for practical systems.
+In Ratis, we introduce snapshot mechanism as the way to do log compaction. 
+
+The basic idea of snapshot is to create and save a snapshot reflecting the 
latest state of the state machine, 
+and then delete previous logs up to the checkpoint.
+
+## Implement snapshot
+
+To enable snapshot, we have to first implement the following two methods in 
`StateMachine`:
+
+```java
+/**
+ * Dump the in-memory state into a snapshot file in the RaftStorage. The
+ * StateMachine implementation can decide 1) its own snapshot format, 2) when
+ * a snapshot is taken, and 3) how the snapshot is taken (e.g., whether the
+ * snapshot blocks the state machine, and whether to purge log entries after
+ * a snapshot is done).
+ *
+ * The snapshot should include the latest raft configuration.
+ *
+ * @return the largest index of the log entry that has been applied to the
+ *         state machine and also included in the snapshot. Note the log purge
+ *         should be handled separately.
+ */
+long takeSnapshot() throws IOException;
+```
+
+```java
+/**
+ * Returns the information for the latest durable snapshot.
+ */
+SnapshotInfo getLatestSnapshot();
+```
+
+Snapshotting for memory-based state machines is conceptually simple. In
+snapshotting, the entire current system state is written to a snapshot on 
stable storage.
+
+With disk-based state machines, a recent copy of the system state is 
maintained on disk as
+part of normal operation. Thus, the Raft log can be discarded as soon as the 
state machine
+reflects writes to disk, and snapshotting is used only when sending consistent 
disk images to
+other servers.
+
+Examples of snapshot implementation can be found at 
+https://github.com/apache/ratis/blob/master/ratis-examples/src/main/java/org/apache/ratis/examples/arithmetic/ArithmeticStateMachine.java.
+
+## Trigger a snapshot
+
+To trigger a snapshot, we can either
+
+* Trigger snapshot manually using `SnapshotManagementApi`. 
+Note that Ratis imposes a minimal creation gap between two subsequent snapshot 
creation:
+
+  ```java
+  // SnapshotManagementApi
+  RaftClientReply create(long timeoutMs) throws IOException;
+  ```
+  
+  ```java
+  // customize snapshot creation gap
+  RaftServerConfigKeys.Snapshot.setCreationGap(properties, 1024L);
+  ```
+  
+* Enable triggering snapshot automatically when log size exceeds limit (in 
number of applied log entries). 
+To do so, we may turn on the auto-trigger-snapshot option in RaftProperties 
+and set an appropriate triggering threshold.
+
+  ```java
+  RaftServerConfigKeys.Snapshot.setAutoTriggerEnabled(properties, true);
+  RaftServerConfigKeys.Snapshot.setAutoTriggerThreshold(properties, 400000);
+  ```
+
+## Purge obsolete logs after snapshot
+
+When a snapshot is taken, Ratis will automatically discard obsolete logs.
+By default, Ratis will purge logs up to min(snapshot index, safe index), 
+where safe index equals to the minimal commit index of all group members.
+In other words, if a log is committed by all group members and is included in 
the latest snapshot,
+then this log can be safely deleted.
+
+Sometimes we can choose to aggressively purge the logs up to the snapshot 
index 
+even if some peers do not have commit index up to snapshot index. 
+To do this, we can turn on the purge-up-to-snapshot-index option in 
RaftProperties:
+
+```java
+RaftServerConfigKeys.Log.setPurgeUptoSnapshotIndex(properties, true);
+```
+
+Purging the logs up to snapshot index sometimes leads to unnecessary burst of 
network bandwidth. 
+That is, even if a follower lags behind only a few logs, the leader still 
needs to transfer the full snapshot. 
+To avoid this situation, we can preserve some recent logs when purging logs up 
to snapshot index. 
+To do this, we can set the number of logs to be preserved when purging in 
RaftProperties:
+
+```java
+RaftServerConfigKeys.Log.setPurgePreservationLogNum(properties, n);
+```
+Intuitively, cost of transferring n logs shall equal the cost of transferring 
the full snapshot.
+
+## Retain multiple versions of snapshot
+
+Ratis allows `StateMachine` to retain multiple versions of snapshot. 
+To enable this feature, we have to:
+
+1. Set how many number of versions to retain in RaftProperties when building 
the RaftServer:
+   ```java
+   RaftServerConfigKeys.Snapshot.setRetentionFileNum(properties, 2);
+   ```
+2. Implement `StateMachineStorage.cleanupOldSnapshots` to clean up old 
versions of snapshot:
+   ```java
+   // StateMachineStorage.java
+   void cleanupOldSnapshots(SnapshotRetentionPolicy snapshotRetentionPolicy) 
throws IOException;
+   ```
+
+## Load the latest snapshot
+
+1. When a RaftServer restarts and tries to recover the data, it first has to 
read and load the latest snapshot,
+   and then apply the logs not included in the snapshot. 
+   We have to implement snapshot loading in `StateMachine.initialize` 
lifecycle hook.
+
+    ```java
+    /*
+    * Initializes the State Machine with the given parameter.
+    * The state machine must, if there is any, read the latest snapshot.
+    */
+    void initialize(RaftServer raftServer, RaftGroupId raftGroupId, 
RaftStorage storage) throws IOException;
+    ```
+
+2. When a RaftServer newly joins an existing cluster, 
+   it has first to obtain the latest snapshot and install it locally. Before 
installing a snapshot,
+   `StateMachine.pause` hook is called to ensure that a new snapshot can be 
installed.
+   ```java
+   /* 
+   * Pauses the state machine. On return, the state machine should have closed 
all open files so
+   * that a new snapshot can be installed.
+   */
+   void pause();
+    ```
+   After installing the snapshot, `StateMachine.reinitialize` is called. 
+   We shall initialize the state machine with the latest installed snapshot in 
this lifecycle hook.
+   ```java
+   /**
+   * Re-initializes the State Machine in PAUSED state. The
+   * state machine is responsible reading the latest snapshot from the file 
system (if any) and
+   * initialize itself with the latest term and index there including all the 
edits.
+   */
+   void reinitialize() throws IOException;
+   ```
+   
+
+## Customize snapshot storage path
+
+By default, Ratis assumes `StateMachine` snapshot files be placed under 
+`RaftStorageDirectory.getStateMachineDir()`. When leader installs a snapshot 
to the follower,
+Ratis will keep the snapshot layout in follower side unchanged relative to 
this directory. 
+That is, the installed snapshot under follower's state machine directory
+will have the same hierarchy as in the leader side.
+
+`StateMachine` can also customize the snapshot storage and install directory. 
To do this, 
+we may provide the snapshot storage root directory in `StateMachineStorage`, 
+together with a temporary directory holding in-transmitting snapshot files.
+
+```java
+// StateMachineStorage
+/** @return the state machine directory. */
+default File getSnapshotDir() {
+  return null;
+}
+
+/** @return the temporary directory. */
+default File getTmpDir() {
+  return null;
+}
+```
+Examples of customizing snapshot storage path can be found at
+https://github.com/apache/ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/InstallSnapshotFromLeaderTests.java
+
+## Customize snapshot installation
+
+When a new follower joins the cluster, leader will install the latest snapshot 
to the follower.
+`StateMachine` only needs to provide the `SnapshotInfo` of the latest 
snapshot, and it's Ratis'
+responsibility to handle all the nuances of dividing snapshot into chunks / 
transferring chunks / 
+validating checksum. This default implementation works well in most scenarios. 
+
+However, there are scenarios where the default implementation cannot satisfy. 
To name a few:
+1. `StateMachine` has a flexible snapshot layout with files scattering around 
different directories.
+2. Follower wants to fully utilize underling topology. For example, follower 
may download the snapshot 
+from the nearest follower instead of the leader.
+3. Follower wants to download snapshot from different sources in parallel. 
+
+Ratis provides `InstallSnapshotNotification` to let `StateMachine` take over 
the full control
+of snapshot installation. To enable this feature, we may first turn off the 
leader-install-snapshot option
+to disable leader explicitly installing snapshot to follower:
+```java
+RaftServerConfigKeys.Log.Appender.setInstallSnapshotEnabled(properties, false);
+```
+After this option is disabled, Whenever the leader detects that a follower 
needs snapshot, 
+instead of installing snapshot to that follower, 
+leader will only send a snapshot installation notification. 
+It's now the follower's responsibility to install the latest snapshot 
asynchronously.
+To do this, we have to implement 
`FollowerEventApi.notifyInstallSnapshotFromLeader`:
+
+```java
+interface FollowerEventApi {
+  /**
+   * Notify the {@link StateMachine} that the leader has purged entries from 
its log.
+   * In order to catch up, the {@link StateMachine} has to install the latest 
snapshot asynchronously.
+   *
+   * @param roleInfoProto information about the current node role and rpc 
delay information.
+   * @param firstTermIndexInLog The term-index of the first append entry 
available in the leader's log.
+   * @return return the last term-index in the snapshot after the snapshot 
installation.
+   */
+  default CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
+      RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
+    return CompletableFuture.completedFuture(null);
+  }
+}
+```
+Examples of `notifyInstallSnapshotFromLeader` implementation can be found at 
+https://github.com/apache/ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/InstallSnapshotNotificationTests.java
\ No newline at end of file

Reply via email to