[ 
https://issues.apache.org/jira/browse/RATIS-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1369:
-----------------------------
    Description: 
For now, if there is no snapshot existed, the returned snapshotIndex is 0.

It is not correctly, since the raft log starts from index 0 (BTW, in raft 
paper, the first valid log index is 1, but in ratis, it is 0), a snapshot taken 
at snapshotIndex 0 should contain the log entry 0. The snapshotIndex indicating 
an empty/fake snapshot should be -1.

 

The fix is as follows: 
{code:java}
diff --git 
a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java 
b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
index 728f7e9c..6307ca79 100644
--- a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
+++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
@@ -431,7 +431,7 @@ class ServerState implements Closeable {
 
   long getLatestInstalledSnapshotIndex() {
     final TermIndex ti = latestInstalledSnapshot.get();
-    return ti != null? ti.getIndex(): 0L;
+    return ti != null? ti.getIndex(): -1L;
   }
 
   /**
@@ -440,7 +440,7 @@ class ServerState implements Closeable {
    */
   long getSnapshotIndex() {
     final SnapshotInfo s = getLatestSnapshot();
-    final long latestSnapshotIndex = s != null ? s.getIndex() : 0;
+    final long latestSnapshotIndex = s != null ? s.getIndex() : -1;
     return Math.max(latestSnapshotIndex, getLatestInstalledSnapshotIndex());
   }
{code}
 

This issue is found in the test of bootstrap SCM in SCM HA. 

 

*TL; DR*

*The issue can be reproduced by follow steps:*

1, use follow configuration for SCM HA
{code:java}
  <!-- scm snapshot -->
  <property>
    <name>ozone.scm.ha.ratis.snapshot.threshold</name>
    <value>1</value>
  </property>
  <property>
    <name>ozone.scm.ha.ratis.log.purge.gap</name>
    <value>5</value>
  </property>
  <property>
    <name>ozone.scm.ha.ratis.log.purge.enabled</name>
    <value>true</value>
  </property>
{code}
2, int and start SCM1

3, start DN1, DN2, DN3

4, wait for a while, until there is dozens of raft log entries.

5, bootstrap SCM2

6, start SCM2, SCM1 will be stuck in appendEntries Timeout.

 

 *The RCA is:*

after init SCM1, the raft log is 
{code:java}
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls 
/tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
log_inprogress_0  raft-meta  raft-meta.conf
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
{code}
 

after SCM1 is started, and there is dozens of log entries generated, the raft 
log is
{code:java}
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls 
/tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
log_inprogress_1  raft-meta  raft-meta.conf
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
{code}
Since purge is enable and purge gap is 5, the file log_0_0 is removed. The 
leader does not have entry 0 any more.

 

Then SCM2 is bootstrapped and started, we expect SCM2 to download a snapshot 
from leader, since leader miss log entry 0, and SCM2 is empty.

The follow SCM do receive an installSnapshot request,
{code:java}
021-04-25 19:25:55,933 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive 
installSnapshot: 
630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2,
 i:1)
{code}
But due to the default snapshot index is 0, 
{code:java}
if (inProgressInstallSnapshotRequest.compareAndSet(null, 
firstAvailableLogTermIndex)) {

  // Check if snapshot index is already at par or ahead of the first
  // available log index of the Leader.
  long snapshotIndex = state.getSnapshotIndex();
  if (snapshotIndex + 1 >= firstAvailableLogIndex) {
    // State Machine has already installed the snapshot. Return the
    // latest snapshot index to the Leader.

    inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, 
null);
    final InstallSnapshotReplyProto reply = 
ServerProtoUtils.toInstallSnapshotReplyProto(
        leaderId, getMemberId(), currentTerm, 
InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
    LOG.info("{}: StateMachine snapshotIndex is {}", getMemberId(), 
snapshotIndex);
    return reply;
  }
{code}
0(snapshotIndex) + 1 == 1(firstAvailableLogIndex), thus follower SCM ignore the 
 downloading.
{code:java}
2021-04-25 19:25:55,933 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive 
installSnapshot: 
630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2,
 i:1)
2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM 
is still follower.
2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: change Leader from 
null to 630b3fbd-861f-4676-8e16-8ae548ed6658 at term 2 for installSnapshot, 
leader elected after 1416ms
2021-04-25 19:25:55,954 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: StateMachine 
snapshotIndex is 0
2021-04-25 19:25:56,019 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set new configuration 
index: 1
configurationEntry {
  peers {
    id: "630b3fbd-861f-4676-8e16-8ae548ed6658"
    address: "9.134.50.210:9865"
  }
}
 from snapshot
2021-04-25 19:25:56,022 [grpc-default-executor-0] DEBUG 
org.apache.ratis.util.LifeCycle: 
630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0:
 NEW
2021-04-25 19:25:56,022 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set configuration 1: 
[630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0],
 old=null
2021-04-25 19:25:56,026 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: reply installSnapshot: 
630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
2021-04-25 19:25:56,026 [grpc-default-executor-0] DEBUG 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
55560a65-e7f7-46f0-b463-f511187fd358: reply 
630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
{code}
 

In the leader side, the matchIndex is updated to 0, and the nextIndex is 
updated to 1.
{code:java}
@Override
public void setSnapshotIndex(long newSnapshotIndex) {
  snapshotIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
  matchIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
  nextIndex.setUnconditionally(newSnapshotIndex + 1, infoIndexChange);
}
{code}
{code:java}
2021-04-25 19:23:58,738 [grpc-default-executor-0] INFO 
org.apache.ratis.grpc.server.GrpcLogAppender: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler:
 received the first reply 
630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
2021-04-25 19:23:58,740 [grpc-default-executor-0] INFO 
org.apache.ratis.grpc.server.GrpcLogAppender: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler:
 Already Installed Snapshot Index 0.
2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 snapshotIndex: setUnconditionally 0 -> 0
2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 matchIndex: setUnconditionally 0 -> 0
2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 nextIndex: setUnconditionally 0 -> 1
2021-04-25 19:23:58,741 [grpc-default-executor-0] DEBUG 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 commitIndex: updateToMax old=-1, new=0, updated? true
{code}
 

Since matchIndex is monotonically increased, and the nextIndex will at least be 
matchIndex + 1, 
{code:java}
final long nextIndex = 1 + Optional.ofNullable(request)
    .map(AppendEntriesRequest::getPreviousLog)
    .map(TermIndex::getIndex)
    .orElseGet(getFollower()::getMatchIndex);
{code}
Leader keep sending entries from index 1, but follower is empty, missing entry 
0, the cluster stuck in inconsistency append entries.

 

With the fix, the download of checkpoint happened, and the system works well.
{code:java}
2021-04-26 11:23:10,761 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: receive 
installSnapshot: 
6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2,
 i:1)
2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM 
is still follower.
2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: change Leader from 
null to 6e7bb411-69c5-40ea-8238-bc65cc4e9b55 at term 2 for installSnapshot, 
leader elected after 1391ms
2021-04-26 11:23:10,787 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: notifyInstallSnapshot: 
nextIndex is 0 but the leader's first available index is 1.
2021-04-26 11:23:11,090 [grpc-default-executor-0] INFO 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine: Received install snapshot 
notification from SCM leader: 9.134.50.210:9865 with term index: (t:2, i:1)
2021-04-26 11:23:11,091 [pool-13-thread-1] INFO 
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl: Downloading checkpoint from 
leader SCM scm1 and reloading state from the checkpoint.
2021-04-26 11:23:11,092 [grpc-default-executor-0] DEBUG 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: Snapshot Installation 
Request received and is in progress
2021-04-26 11:23:11,164 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set new configuration 
index: 1
configurationEntry {
  peers {
    id: "6e7bb411-69c5-40ea-8238-bc65cc4e9b55"
    address: "9.134.50.210:9865"
  }
}
 from snapshot
2021-04-26 11:23:11,166 [grpc-default-executor-0] DEBUG 
org.apache.ratis.util.LifeCycle: 
6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0:
 NEW
2021-04-26 11:23:11,167 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set configuration 1: 
[6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0],
 old=null
2021-04-26 11:23:11,171 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: reply installSnapshot: 
6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
2021-04-26 11:23:11,171 [grpc-default-executor-0] DEBUG 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf: reply 
6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
2021-04-26 11:23:11,175 [grpc-default-executor-0] INFO 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
a0ba8aa4-5cff-4452-9adf-638135a9ebaf: Completed INSTALL_SNAPSHOT, lastRequest: 
6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2,
 i:1)
{code}
 

  was:
For now, if there is no snapshot existed, the returned snapshotIndex is 0.

It is not correctly, since the raft log starts from index 0 (BTW, in raft 
paper, the first valid log index is 1, but in ratis, it is 0), a snapshot taken 
at snapshotIndex 0 should contain the log entry 0. The snapshotIndex indicating 
an empty/fake snapshot should be -1.

 
{code:java}
diff --git 
a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java 
b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
index 728f7e9c..6307ca79 100644
--- a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
+++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
@@ -431,7 +431,7 @@ class ServerState implements Closeable {
 
   long getLatestInstalledSnapshotIndex() {
     final TermIndex ti = latestInstalledSnapshot.get();
-    return ti != null? ti.getIndex(): 0L;
+    return ti != null? ti.getIndex(): -1L;
   }
 
   /**
@@ -440,7 +440,7 @@ class ServerState implements Closeable {
    */
   long getSnapshotIndex() {
     final SnapshotInfo s = getLatestSnapshot();
-    final long latestSnapshotIndex = s != null ? s.getIndex() : 0;
+    final long latestSnapshotIndex = s != null ? s.getIndex() : -1;
     return Math.max(latestSnapshotIndex, getLatestInstalledSnapshotIndex());
   }
{code}
 

This issue is found in the test of bootstrap SCM in SCM HA. 

 

*TL; DR*

*The issue can be reproduced by follow steps:*

1, use follow configuration for SCM HA
{code:java}
  <!-- scm snapshot -->
  <property>
    <name>ozone.scm.ha.ratis.snapshot.threshold</name>
    <value>1</value>
  </property>
  <property>
    <name>ozone.scm.ha.ratis.log.purge.gap</name>
    <value>5</value>
  </property>
  <property>
    <name>ozone.scm.ha.ratis.log.purge.enabled</name>
    <value>true</value>
  </property>
{code}
2, int and start SCM1

3, start DN1, DN2, DN3

4, wait for a while, until there is dozens of raft log entries.

5, bootstrap SCM2

6, start SCM2, SCM1 will be stuck in appendEntries Timeout.

 

 *The RCA is:*

after init SCM1, the raft log is 

 
{code:java}
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls 
/tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
log_inprogress_0  raft-meta  raft-meta.conf
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
{code}
 

 

after SCM1 is started, and there is dozens of log entries generated, the raft 
log is
{code:java}
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls 
/tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
log_inprogress_1  raft-meta  raft-meta.conf
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
{code}
Since purge is enable and purge gap is 5, the file log_0_0 is removed. The 
leader does not have entry 0 any more.

Then SCM2 is bootstrapped and started, we expect SCM2 to download a snapshot 
from leader, since leader miss log entry 0, and SCM2 is empty.

The follow SCM do receive an installSnapshot request,
{code:java}
021-04-25 19:25:55,933 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive 
installSnapshot: 
630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2,
 i:1)
{code}
But due to the default snapshot index is 0, 
{code:java}
if (inProgressInstallSnapshotRequest.compareAndSet(null, 
firstAvailableLogTermIndex)) {

  // Check if snapshot index is already at par or ahead of the first
  // available log index of the Leader.
  long snapshotIndex = state.getSnapshotIndex();
  if (snapshotIndex + 1 >= firstAvailableLogIndex) {
    // State Machine has already installed the snapshot. Return the
    // latest snapshot index to the Leader.

    inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, 
null);
    final InstallSnapshotReplyProto reply = 
ServerProtoUtils.toInstallSnapshotReplyProto(
        leaderId, getMemberId(), currentTerm, 
InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
    LOG.info("{}: StateMachine snapshotIndex is {}", getMemberId(), 
snapshotIndex);
    return reply;
  }
{code}
0(snapshotIndex) + 1 == 1(firstAvailableLogIndex), thus follower SCM ignore the 
 downloading.
{code:java}
2021-04-25 19:25:55,933 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive 
installSnapshot: 
630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2,
 i:1)
2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM 
is still follower.
2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: change Leader from 
null to 630b3fbd-861f-4676-8e16-8ae548ed6658 at term 2 for installSnapshot, 
leader elected after 1416ms
2021-04-25 19:25:55,954 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: StateMachine 
snapshotIndex is 0
2021-04-25 19:25:56,019 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set new configuration 
index: 1
configurationEntry {
  peers {
    id: "630b3fbd-861f-4676-8e16-8ae548ed6658"
    address: "9.134.50.210:9865"
  }
}
 from snapshot
2021-04-25 19:25:56,022 [grpc-default-executor-0] DEBUG 
org.apache.ratis.util.LifeCycle: 
630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0:
 NEW
2021-04-25 19:25:56,022 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set configuration 1: 
[630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0],
 old=null
2021-04-25 19:25:56,026 [grpc-default-executor-0] INFO 
org.apache.ratis.server.RaftServer$Division: 
55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: reply installSnapshot: 
630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
2021-04-25 19:25:56,026 [grpc-default-executor-0] DEBUG 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
55560a65-e7f7-46f0-b463-f511187fd358: reply 
630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
{code}
In the leader side, the matchIndex updates to 0, and the nextIndex updates to 1.
{code:java}
@Override
public void setSnapshotIndex(long newSnapshotIndex) {
  snapshotIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
  matchIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
  nextIndex.setUnconditionally(newSnapshotIndex + 1, infoIndexChange);
}
{code}
 
{code:java}
2021-04-25 19:23:58,738 [grpc-default-executor-0] INFO 
org.apache.ratis.grpc.server.GrpcLogAppender: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler:
 received the first reply 
630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
2021-04-25 19:23:58,740 [grpc-default-executor-0] INFO 
org.apache.ratis.grpc.server.GrpcLogAppender: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler:
 Already Installed Snapshot Index 0.
2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 snapshotIndex: setUnconditionally 0 -> 0
2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 matchIndex: setUnconditionally 0 -> 0
2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 nextIndex: setUnconditionally 0 -> 1
2021-04-25 19:23:58,741 [grpc-default-executor-0] DEBUG 
org.apache.ratis.server.leader.FollowerInfo: 
630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
 commitIndex: updateToMax old=-1, new=0, updated? true
{code}
Since matchIndex is monotonically increased, and the nextIndex will at least be 
matchIndex + 1, 
{code:java}
final long nextIndex = 1 + Optional.ofNullable(request)
    .map(AppendEntriesRequest::getPreviousLog)
    .map(TermIndex::getIndex)
    .orElseGet(getFollower()::getMatchIndex);
{code}
Leader keep sending entries from index 1, but follower is empty, missing entry 
0, the cluster stuck in inconsistency append entries.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


> When there is no snapshot, the return snapshotIndex should be -1, not 0.
> ------------------------------------------------------------------------
>
>                 Key: RATIS-1369
>                 URL: https://issues.apache.org/jira/browse/RATIS-1369
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>         Attachments: ozone-root-scm-9.134.51.215.log - SCM2, follower.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For now, if there is no snapshot existed, the returned snapshotIndex is 0.
> It is not correctly, since the raft log starts from index 0 (BTW, in raft 
> paper, the first valid log index is 1, but in ratis, it is 0), a snapshot 
> taken at snapshotIndex 0 should contain the log entry 0. The snapshotIndex 
> indicating an empty/fake snapshot should be -1.
>  
> The fix is as follows: 
> {code:java}
> diff --git 
> a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java 
> b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> index 728f7e9c..6307ca79 100644
> --- a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> +++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> @@ -431,7 +431,7 @@ class ServerState implements Closeable {
>  
>    long getLatestInstalledSnapshotIndex() {
>      final TermIndex ti = latestInstalledSnapshot.get();
> -    return ti != null? ti.getIndex(): 0L;
> +    return ti != null? ti.getIndex(): -1L;
>    }
>  
>    /**
> @@ -440,7 +440,7 @@ class ServerState implements Closeable {
>     */
>    long getSnapshotIndex() {
>      final SnapshotInfo s = getLatestSnapshot();
> -    final long latestSnapshotIndex = s != null ? s.getIndex() : 0;
> +    final long latestSnapshotIndex = s != null ? s.getIndex() : -1;
>      return Math.max(latestSnapshotIndex, getLatestInstalledSnapshotIndex());
>    }
> {code}
>  
> This issue is found in the test of bootstrap SCM in SCM HA. 
>  
> *TL; DR*
> *The issue can be reproduced by follow steps:*
> 1, use follow configuration for SCM HA
> {code:java}
>   <!-- scm snapshot -->
>   <property>
>     <name>ozone.scm.ha.ratis.snapshot.threshold</name>
>     <value>1</value>
>   </property>
>   <property>
>     <name>ozone.scm.ha.ratis.log.purge.gap</name>
>     <value>5</value>
>   </property>
>   <property>
>     <name>ozone.scm.ha.ratis.log.purge.enabled</name>
>     <value>true</value>
>   </property>
> {code}
> 2, int and start SCM1
> 3, start DN1, DN2, DN3
> 4, wait for a while, until there is dozens of raft log entries.
> 5, bootstrap SCM2
> 6, start SCM2, SCM1 will be stuck in appendEntries Timeout.
>  
>  *The RCA is:*
> after init SCM1, the raft log is 
> {code:java}
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls 
> /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
> log_inprogress_0  raft-meta  raft-meta.conf
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
> {code}
>  
> after SCM1 is started, and there is dozens of log entries generated, the raft 
> log is
> {code:java}
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls 
> /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
> log_inprogress_1  raft-meta  raft-meta.conf
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
> {code}
> Since purge is enable and purge gap is 5, the file log_0_0 is removed. The 
> leader does not have entry 0 any more.
>  
> Then SCM2 is bootstrapped and started, we expect SCM2 to download a snapshot 
> from leader, since leader miss log entry 0, and SCM2 is empty.
> The follow SCM do receive an installSnapshot request,
> {code:java}
> 021-04-25 19:25:55,933 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive 
> installSnapshot: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2,
>  i:1)
> {code}
> But due to the default snapshot index is 0, 
> {code:java}
> if (inProgressInstallSnapshotRequest.compareAndSet(null, 
> firstAvailableLogTermIndex)) {
>   // Check if snapshot index is already at par or ahead of the first
>   // available log index of the Leader.
>   long snapshotIndex = state.getSnapshotIndex();
>   if (snapshotIndex + 1 >= firstAvailableLogIndex) {
>     // State Machine has already installed the snapshot. Return the
>     // latest snapshot index to the Leader.
>     
> inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, 
> null);
>     final InstallSnapshotReplyProto reply = 
> ServerProtoUtils.toInstallSnapshotReplyProto(
>         leaderId, getMemberId(), currentTerm, 
> InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
>     LOG.info("{}: StateMachine snapshotIndex is {}", getMemberId(), 
> snapshotIndex);
>     return reply;
>   }
> {code}
> 0(snapshotIndex) + 1 == 1(firstAvailableLogIndex), thus follower SCM ignore 
> the  downloading.
> {code:java}
> 2021-04-25 19:25:55,933 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive 
> installSnapshot: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2,
>  i:1)
> 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current 
> SCM is still follower.
> 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: change Leader from 
> null to 630b3fbd-861f-4676-8e16-8ae548ed6658 at term 2 for installSnapshot, 
> leader elected after 1416ms
> 2021-04-25 19:25:55,954 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: StateMachine 
> snapshotIndex is 0
> 2021-04-25 19:25:56,019 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set new 
> configuration index: 1
> configurationEntry {
>   peers {
>     id: "630b3fbd-861f-4676-8e16-8ae548ed6658"
>     address: "9.134.50.210:9865"
>   }
> }
>  from snapshot
> 2021-04-25 19:25:56,022 [grpc-default-executor-0] DEBUG 
> org.apache.ratis.util.LifeCycle: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0:
>  NEW
> 2021-04-25 19:25:56,022 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set configuration 1: 
> [630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0],
>  old=null
> 2021-04-25 19:25:56,026 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: reply 
> installSnapshot: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> 2021-04-25 19:25:56,026 [grpc-default-executor-0] DEBUG 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 55560a65-e7f7-46f0-b463-f511187fd358: reply 
> 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> {code}
>  
> In the leader side, the matchIndex is updated to 0, and the nextIndex is 
> updated to 1.
> {code:java}
> @Override
> public void setSnapshotIndex(long newSnapshotIndex) {
>   snapshotIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
>   matchIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
>   nextIndex.setUnconditionally(newSnapshotIndex + 1, infoIndexChange);
> }
> {code}
> {code:java}
> 2021-04-25 19:23:58,738 [grpc-default-executor-0] INFO 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler:
>  received the first reply 
> 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> 2021-04-25 19:23:58,740 [grpc-default-executor-0] INFO 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler:
>  Already Installed Snapshot Index 0.
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.leader.FollowerInfo: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
>  snapshotIndex: setUnconditionally 0 -> 0
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.leader.FollowerInfo: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
>  matchIndex: setUnconditionally 0 -> 0
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.leader.FollowerInfo: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
>  nextIndex: setUnconditionally 0 -> 1
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] DEBUG 
> org.apache.ratis.server.leader.FollowerInfo: 
> 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358:
>  commitIndex: updateToMax old=-1, new=0, updated? true
> {code}
>  
> Since matchIndex is monotonically increased, and the nextIndex will at least 
> be matchIndex + 1, 
> {code:java}
> final long nextIndex = 1 + Optional.ofNullable(request)
>     .map(AppendEntriesRequest::getPreviousLog)
>     .map(TermIndex::getIndex)
>     .orElseGet(getFollower()::getMatchIndex);
> {code}
> Leader keep sending entries from index 1, but follower is empty, missing 
> entry 0, the cluster stuck in inconsistency append entries.
>  
> With the fix, the download of checkpoint happened, and the system works well.
> {code:java}
> 2021-04-26 11:23:10,761 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: receive 
> installSnapshot: 
> 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2,
>  i:1)
> 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current 
> SCM is still follower.
> 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: change Leader from 
> null to 6e7bb411-69c5-40ea-8238-bc65cc4e9b55 at term 2 for installSnapshot, 
> leader elected after 1391ms
> 2021-04-26 11:23:10,787 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: 
> notifyInstallSnapshot: nextIndex is 0 but the leader's first available index 
> is 1.
> 2021-04-26 11:23:11,090 [grpc-default-executor-0] INFO 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine: Received install snapshot 
> notification from SCM leader: 9.134.50.210:9865 with term index: (t:2, i:1)
> 2021-04-26 11:23:11,091 [pool-13-thread-1] INFO 
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl: Downloading checkpoint from 
> leader SCM scm1 and reloading state from the checkpoint.
> 2021-04-26 11:23:11,092 [grpc-default-executor-0] DEBUG 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: Snapshot 
> Installation Request received and is in progress
> 2021-04-26 11:23:11,164 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set new 
> configuration index: 1
> configurationEntry {
>   peers {
>     id: "6e7bb411-69c5-40ea-8238-bc65cc4e9b55"
>     address: "9.134.50.210:9865"
>   }
> }
>  from snapshot
> 2021-04-26 11:23:11,166 [grpc-default-executor-0] DEBUG 
> org.apache.ratis.util.LifeCycle: 
> 6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0:
>  NEW
> 2021-04-26 11:23:11,167 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set configuration 1: 
> [6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0],
>  old=null
> 2021-04-26 11:23:11,171 [grpc-default-executor-0] INFO 
> org.apache.ratis.server.RaftServer$Division: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: reply 
> installSnapshot: 
> 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
> 2021-04-26 11:23:11,171 [grpc-default-executor-0] DEBUG 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf: reply 
> 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
> 2021-04-26 11:23:11,175 [grpc-default-executor-0] INFO 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> a0ba8aa4-5cff-4452-9adf-638135a9ebaf: Completed INSTALL_SNAPSHOT, 
> lastRequest: 
> 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2,
>  i:1)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to