[
https://issues.apache.org/jira/browse/HDDS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698826#comment-17698826
]
Ivan Andika commented on HDDS-8131:
-----------------------------------
There is an ongoing effort to integrate Incremental Checkpoint to increase the
efficiency of Snapshot.
https://issues.apache.org/jira/browse/HDDS-6510
https://issues.apache.org/jira/browse/HDDS-6961
However since they are still in-progress, we can tune the Ratis parameters
first.
> Add Configuration for OM Ratis Log Purge Tuning Parameters
> ----------------------------------------------------------
>
> Key: HDDS-8131
> URL: https://issues.apache.org/jira/browse/HDDS-8131
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Manager
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.3.0
>
>
> Currently Ozone Manager enables {{raft.server.log.purge.upto.snapshot.index}}
> by default.
> However, for OM cluster with large metadata store, there might be a case
> where OM leader purge its Ratis logs before a slow follower replicated it to
> its log. This means that the follower needs to download the whole metadata
> store from the OM leader. This can be problematic if the metadata store in
> leader is too large.
> We should add two configurations in OM to enable/disable Ratis purge
> parameters:
> * {{raft.server.log.purge.upto.snapshot.index}}
> ** Disabling this would guarantee that the OM leader will not purge its
> Ratis log unless all the logs have been replicated to all the followers
> (through {{{}commitIndex{}}}).
> ** This would effectively means that there shouldn't be a case where the
> slow follower needs to download the full metadata from the leader. So no
> snapshot down from follower. For small OM metadata, it can be faster for
> follower to download the leader's metadata snapshot than normally replicating
> and applying the outstanding logs.
> ** For a very slow follower / downed follower, the OM leader cannot purge
> the log until the follower catch up to it. This might increase the disk space
> usage for OM leader.
> ** Default would be {{true}} to preserve the current OM snapshot behavior
> * {{raft.server.log.purge.preservation.log.num}}
> ** RATIS-1626 introduces logic to preserve the latest n won't-be-purged logs
> ** Setting n > 0 while still enabling
> {{raft.server.log.purge.upto.snapshot.index}} should balance a between the
> cost of preserving & transferring logs and the cost of transferring snapshot.
> ** Default would be 0 to preserve the current OM snapshot behavior
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]