[
https://issues.apache.org/jira/browse/HDDS-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644606#comment-17644606
]
Siddharth Wagle commented on HDDS-6510:
---------------------------------------
[~Nibiruxu] Thank you for posting the detailed design on this. Snapshots
feature already uses incremental checkpointing.
Copying snapshots folks to review this: [~ppogde] [~smeng]
> Incremental Checkpointing Support
> ---------------------------------
>
> Key: HDDS-6510
> URL: https://issues.apache.org/jira/browse/HDDS-6510
> Project: Apache Ozone
> Issue Type: New Feature
> Reporter: Xu Shao Hong
> Assignee: Xu Shao Hong
> Priority: Major
> Labels: pull-request-available
> Attachments: 2022-03-15 7.58.44.png
>
>
> Currently, each time to install a snapshot for OM and SCM is to get a
> checkpoint of RDB and send it to the follower. As the data stored in RDB
> increases, the very long transmission time of the whole checkpoint could be a
> large cost, which could cause the follower to install the snapshot repeatedly
> if it finds out the leader has already truncated the new raft logs and needs
> to install a new snapshot.
> Given an example in the test(OM), the raft log index is 570767469, it takes
> around 13 minutes for the follower to install the snapshot. As ozone is
> designed to overcome the shortage of in-memory metadata, it should have the
> ability to preserve much more data than a hundred million level. Once the OM
> has reached that level, each time to install snapshot would be a big problem.
> There will be only two raft peers working (if we set up 3-node HA) and that
> condition is fragile.
> Another statics: For 16 hundred million keys, the size of om.db directory is
> 45GB. Around 2.8 hundred million keys/GB. This is tested through createKey
> api.
> To solve the problem, we should have Incremental Checkpointing. This could
> provide another slight increment instead of the whole RDB checkpoint and thus
> reduce the time of transmission. I recommend referring to the implementation
> in FLINK, but we need to store the diff of checkpoints locally instead of
> another storage system.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]