Ivan Andika created HDDS-11585:
----------------------------------

             Summary: Add DN Ratis log purge parameters to detect slow follower
                 Key: HDDS-11585
                 URL: https://issues.apache.org/jira/browse/HDDS-11585
             Project: Apache Ozone
          Issue Type: Improvement
          Components: Ozone Datanode
            Reporter: Ivan Andika
            Assignee: Ivan Andika
         Attachments: image-2024-10-15-11-40-41-618.png, 
image-2024-10-15-11-56-47-259.png

Ozone Ratis pipeline seem to have indirect mechanism to detect "slow follower" 
through the notifyInstallSnapshot mechanism.

The idea is that if the leader already purge the logs up to the snapshot index, 
and the leader's first index is higher than the slow follower's next index 
(i.e. the log to replicate to slow follower has been purged), the leader will 
send the notifyInstallSnapshot request to follower and the follower will call 
StateMachine#notifyInstallSnapshotFromLeader API. 

Datanode implementation of notifyInstallSnapshotFromLeader is to close the 
pipeline. This indirectly acts as an automatic "slow follower detector" which 
might be helpful since by default will watch for ALL_COMMITTED (i.e. log index 
needs to be replicated in all DNs) and will increase write latency considerably.

See the following follower index lag that causes prolonged cluster write 
degradation that required administrator to close the pipeline manually.

!image-2024-10-15-11-56-47-259.png|width=582,height=134!

Even after the difference of the log index between leader and follower reaches 
> 1 million, the pipeline is not automatically closed. 

The root cause raft.server.log.purge.upto.snapshot.index default is false. This 
means that the leader will not purge the logs until it has been replicated to 
the slow follower. Therefore, the notifyInstallSnapshot mechanism will never be 
triggered.

Just like HDDS-8131, I propose to make 
raft.server.log.purge.upto.snapshot.index and 
raft.server.log.purge.preservation.log.num to be configurable. The recommended 
configuration would be
 * raft.server.log.purge.upto.snapshot.index = true
 * raft.server.log.purge.preservation.log.num = <SLOW_FOLLOWER_THRESHOLD>

Other snapshot configurations such raft.server.snapshot.auto.trigger.threshold 
(dfs.ratis.snapshot.threshold) also need to be revisited.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to