Ivan Andika created HDDS-11585:
----------------------------------
Summary: Add DN Ratis log purge parameters to detect slow follower
Key: HDDS-11585
URL: https://issues.apache.org/jira/browse/HDDS-11585
Project: Apache Ozone
Issue Type: Improvement
Components: Ozone Datanode
Reporter: Ivan Andika
Assignee: Ivan Andika
Attachments: image-2024-10-15-11-40-41-618.png,
image-2024-10-15-11-56-47-259.png
Ozone Ratis pipeline seem to have indirect mechanism to detect "slow follower"
through the notifyInstallSnapshot mechanism.
The idea is that if the leader already purge the logs up to the snapshot index,
and the leader's first index is higher than the slow follower's next index
(i.e. the log to replicate to slow follower has been purged), the leader will
send the notifyInstallSnapshot request to follower and the follower will call
StateMachine#notifyInstallSnapshotFromLeader API.
Datanode implementation of notifyInstallSnapshotFromLeader is to close the
pipeline. This indirectly acts as an automatic "slow follower detector" which
might be helpful since by default will watch for ALL_COMMITTED (i.e. log index
needs to be replicated in all DNs) and will increase write latency considerably.
See the following follower index lag that causes prolonged cluster write
degradation that required administrator to close the pipeline manually.
!image-2024-10-15-11-56-47-259.png|width=582,height=134!
Even after the difference of the log index between leader and follower reaches
> 1 million, the pipeline is not automatically closed.
The root cause raft.server.log.purge.upto.snapshot.index default is false. This
means that the leader will not purge the logs until it has been replicated to
the slow follower. Therefore, the notifyInstallSnapshot mechanism will never be
triggered.
Just like HDDS-8131, I propose to make
raft.server.log.purge.upto.snapshot.index and
raft.server.log.purge.preservation.log.num to be configurable. The recommended
configuration would be
* raft.server.log.purge.upto.snapshot.index = true
* raft.server.log.purge.preservation.log.num = <SLOW_FOLLOWER_THRESHOLD>
Other snapshot configurations such raft.server.snapshot.auto.trigger.threshold
(dfs.ratis.snapshot.threshold) also need to be revisited.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]