Re: [PR] [CELEBORN-1452] Master follower node metadata is out of sync after installing snapshot [celeborn]

via GitHub Thu, 06 Jun 2024 22:20:19 -0700


leixm commented on PR #2547:
URL: https://github.com/apache/celeborn/pull/2547#issuecomment-2154054970


   ```
   follower log
   24/05/23 03:08:02,421 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Reinitializing state machine.
   24/05/23 03:08:02,421 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: obsolete snapshot provided: (t:28, i:1185003348)
   
   24/05/23 03:06:53,439 WARN [master-state-machine-executor] 
AbstractMetaManager: Celeborn cluster estimated partition size changed from 
129.2 MiB to 110.5 MiB
   24/05/23 03:08:01,425 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Taking a snapshot to file 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1716404881416_5912151569428133017.dat.
   24/05/23 03:08:01,428 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Saving digest 184db806c56d5780f92c98fd7d3fb35a for snapshot file 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.28_1185003348.
   24/05/23 03:08:01,428 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Renaming a snapshot file 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1716404881416_5912151569428133017.dat
 to 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.28_1185003348.
   24/05/23 03:08:02,421 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Reinitializing state machine.
   24/05/23 03:08:02,421 INFO [1@group-47BEDE733167-StateMachineUpdater] 
StateMachine: obsolete snapshot provided: (t:28, i:1185003348)
   
   leader log
   24/05/23 03:08:02,117 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Taking a snapshot to file 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1716404882108_1183932657463440669.dat.
   24/05/23 03:08:02,120 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Saving digest 232b855f8ca673b7a2c38a8ec90c6306 for snapshot file 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.28_1185003493.
   24/05/23 03:08:02,121 INFO [2@group-47BEDE733167-StateMachineUpdater] 
StateMachine: Renaming a snapshot file 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/tmp/raft_snapshot_1716404882108_1183932657463440669.dat
 to 
/mnt/ssd/0/celeborn-ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.28_1185003493.
   ```
   
   After reinitializing, follower out of sync.
   
![meta](https://github.com/apache/celeborn/assets/31424839/aadd6e29-6f20-4997-a9cf-5007243353c2)
   
   There are some variables find from follower jvm heap
   
   1. StateMachineUpdater.appliedIndex   = 1185003348
   2. SegmentedRaftLog.committedIndex = 1188743857
   3. SegmentedRaftLog startIndex = 1185003494, endIndex = 1188743857 
   4. StateMachineUpdater.snapshotIndex = 1185003348
   5. ServerState.latestedInstalledSnapshot=1185003493
   
    
   
   1. When will reinitializing be triggered?
   It is triggered after leader sends a snapshot to the follower and finish 
installing.
   
    
   
   2. When does the leader send a snapshot to the followers in the running 
state?
   When the follower's nextIndex < the leader's startIndex, that is, the log 
required by the follower no longer exists in the leader.
       2.1 Why does the log not exist in the leader?
       There is a certain synchronization delay between the leader and the 
folower. When the leader takes the snapshot first, it will purge its own 
segment log after taking it. At this time, the log with index <= snapshot index 
can be purged, because there is already a snapshot. At this time, the above 
judgment will be Take effect and then send the snapshot
   
    
   
   org.apache.ratis.server.leader.LogAppender#shouldInstallSnapshot
   
   ```
   final long followerNextIndex = getFollower().getNextIndex();
   if (followerNextIndex < getRaftLog().getNextIndex()) {
   final long logStartIndex = getRaftLog().getStartIndex();
   if (followerNextIndex < logStartIndex || (logStartIndex == 
RaftLog.INVALID_LOG_INDEX && snapshot != null))
   
   { return snapshot; }
   }
   ```
   
   3. Why is the snapshot sent by the leader to the follower not load correctly?
   When the follower receives the snapshot, it will write to snapshotDir, but 
it will not refresh the latestSnapshot, so it cannot see this snapshot when 
reinitialize. snapshot.getTermIndex().compareTo(getLastAppliedTermIndex()) <= 0 
must be true
   
    
   
   4. Why is the final metadata of the follower out of sync?
   After the leader sends the snapshot and receives the success status, it will 
update the follower's nextIndex to the sent snapshot last index + 1. The leader 
thinks that the follower has correctly installed snapsohot, but in fact it has 
not, which ultimately causes the follower to miss a section of the log. If the 
follower cannot find the log, it will lose synchronization forever.
   
    
   org.apache.ratis.server.impl.StateMachineUpdater#applyLog
   ```
   final LogEntryProto next = raftLog.get(nextIndex);
   if (next != null) {xxx}
   else {
   LOG.debug("{}: logEntry {} is null. There may be snapshot to load. state:{}",
   this, nextIndex, state);
   break;
   }
   ```
   
    
   
   5. How do we solve this problem?
   We need to modify the SimpleStateMachineStorage currently used by celeborn. 
When the snapshot install is completed, we should select the latest snapshot 
from the local disk directory and update the latestSnapshot variable.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-1452] Master follower node metadata is out of sync after installing snapshot [celeborn]

Reply via email to