[ 
https://issues.apache.org/jira/browse/HDDS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698826#comment-17698826
 ] 

Ivan Andika commented on HDDS-8131:
-----------------------------------

There is an ongoing effort to integrate Incremental Checkpoint to increase the 
efficiency of Snapshot.

https://issues.apache.org/jira/browse/HDDS-6510

https://issues.apache.org/jira/browse/HDDS-6961

However since they are still in-progress, we can tune the Ratis parameters 
first.

> Add Configuration for OM Ratis Log Purge Tuning Parameters
> ----------------------------------------------------------
>
>                 Key: HDDS-8131
>                 URL: https://issues.apache.org/jira/browse/HDDS-8131
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Manager
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.3.0
>
>
> Currently Ozone Manager enables {{raft.server.log.purge.upto.snapshot.index}} 
> by default.
> However, for OM cluster with large metadata store, there might be a case 
> where OM leader purge its Ratis logs before a slow follower replicated it to 
> its log. This means that the follower needs to download the whole metadata 
> store from the OM leader. This can be problematic if the metadata store in 
> leader is too large.
> We should add two configurations in OM to enable/disable Ratis purge 
> parameters:
>  * {{raft.server.log.purge.upto.snapshot.index}}
>  ** Disabling this would guarantee that the OM leader will not purge its 
> Ratis log unless all the logs have been replicated to all the followers 
> (through {{{}commitIndex{}}}).
>  ** This would effectively means that there shouldn't be a case where the 
> slow follower needs to download the full metadata from the leader. So no 
> snapshot down from follower. For small OM metadata, it can be faster for 
> follower to download the leader's metadata snapshot than normally replicating 
> and applying the outstanding logs.
>  ** For a very slow follower / downed follower, the OM leader cannot purge 
> the log until the follower catch up to it. This might increase the disk space 
> usage for OM leader.
>  ** Default would be {{true}} to preserve the current OM snapshot behavior
>  * {{raft.server.log.purge.preservation.log.num}}
>  ** RATIS-1626 introduces logic to preserve the latest n won't-be-purged logs
>  ** Setting n > 0 while still enabling 
> {{raft.server.log.purge.upto.snapshot.index}} should balance a between the 
> cost of preserving & transferring logs and the cost of transferring snapshot.
>  ** Default would be 0 to preserve the current OM snapshot behavior



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to