[ 
https://issues.apache.org/jira/browse/HDDS-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646458#comment-17646458
 ] 

Xu Shao Hong commented on HDDS-6510:
------------------------------------

Sure [~prashantpogde] , it's a good idea, could you schedule a time for the 
meeting?  I am available most time.

> Incremental Checkpointing Support
> ---------------------------------
>
>                 Key: HDDS-6510
>                 URL: https://issues.apache.org/jira/browse/HDDS-6510
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Xu Shao Hong
>            Assignee: Xu Shao Hong
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 2022-03-15 7.58.44.png
>
>
> Currently, each time to install a snapshot for OM and SCM is to get a 
> checkpoint of RDB and send it to the follower. As the data stored in RDB 
> increases, the very long transmission time of the whole checkpoint could be a 
> large cost, which could cause the follower to install the snapshot repeatedly 
> if it finds out the leader has already truncated the new raft logs and needs 
> to install a new snapshot.
> Given an example in the test(OM), the raft log index is 570767469, it takes 
> around 13 minutes for the follower to install the snapshot. As ozone is 
> designed to overcome the shortage of in-memory metadata, it should have the 
> ability to preserve much more data than a hundred million level.  Once the OM 
> has reached that level, each time to install snapshot would be a big problem. 
> There will be only two raft peers working (if we set up 3-node HA) and that 
> condition is fragile.
> Another statics: For 16 hundred million keys, the size of om.db directory is 
> 45GB. Around 2.8 hundred million keys/GB. This is tested through createKey 
> api.
> To solve the problem, we should have Incremental Checkpointing. This could 
> provide another slight increment instead of the whole RDB checkpoint and thus 
> reduce the time of transmission. I recommend referring to the implementation 
> in FLINK, but we need to store the diff of checkpoints locally instead of 
> another storage system.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to