[
https://issues.apache.org/jira/browse/HDDS-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-11585:
-------------------------------
Attachment: (was: image-2024-10-15-11-40-41-618.png)
> Add DN Ratis log purge parameters to detect slow follower
> ---------------------------------------------------------
>
> Key: HDDS-11585
> URL: https://issues.apache.org/jira/browse/HDDS-11585
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Attachments: image-2024-10-15-11-56-47-259.png
>
>
> Ozone Ratis pipeline seem to have indirect mechanism to detect "slow
> follower" through the notifyInstallSnapshot mechanism.
> The idea is that if the leader already purge the logs up to the snapshot
> index, and the leader's first index is higher than the slow follower's next
> index (i.e. the log to replicate to slow follower has been purged), the
> leader will send the notifyInstallSnapshot request to follower and the
> follower will call StateMachine#notifyInstallSnapshotFromLeader API.
> Datanode implementation of notifyInstallSnapshotFromLeader is to close the
> pipeline. This indirectly acts as an automatic "slow follower detector" which
> might be helpful since by default will watch for ALL_COMMITTED (i.e. log
> index needs to be replicated in all DNs) and will increase write latency
> considerably.
> See the following follower index lag that causes prolonged cluster write
> degradation that required administrator to close the pipeline manually.
> !image-2024-10-15-11-56-47-259.png|width=582,height=134!
> Even after the difference of the log index between leader and follower
> reaches > 1 million, the pipeline is not automatically closed.
> The root cause raft.server.log.purge.upto.snapshot.index default is false.
> This means that the leader will not purge the logs until it has been
> replicated to the slow follower. Therefore, the notifyInstallSnapshot
> mechanism will never be triggered.
> Just like HDDS-8131, I propose to make
> raft.server.log.purge.upto.snapshot.index and
> raft.server.log.purge.preservation.log.num to be configurable. The
> recommended configuration would be
> * raft.server.log.purge.upto.snapshot.index = true
> * raft.server.log.purge.preservation.log.num = <SLOW_FOLLOWER_THRESHOLD>
> Other snapshot configurations such
> raft.server.snapshot.auto.trigger.threshold (dfs.ratis.snapshot.threshold)
> also need to be revisited.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]