[jira] [Commented] (HDDS-6510) Incremental Checkpointing Support

Xu Shao Hong (Jira) Thu, 08 Dec 2022 03:30:55 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644751#comment-17644751
 ]


Xu Shao Hong commented on HDDS-6510:
------------------------------------

Thx [~swagle]  for the reply ! 

[~prashantpogde] [~smeng]  Pls correct me if i am wrong~

The incremental snapshot here is the whole metadata snapshot(all RDB tables) 
which is different from the FS semantic snapshot(mainly recoding the volume, 
bucket, and keys).

In HA mode, all OM nodes seem to trigger the snapshot request as a transaction 
and save the FS snapshot independently. Such snapshots will be processed with 
the snapshotInfoTable. 

So basically, this should be compatible with the Snapshot Feature and is much 
simpler since it does not need to scan the content of SST to get difference.

> Incremental Checkpointing Support
> ---------------------------------
>
>                 Key: HDDS-6510
>                 URL: https://issues.apache.org/jira/browse/HDDS-6510
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Xu Shao Hong
>            Assignee: Xu Shao Hong
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 2022-03-15 7.58.44.png
>
>
> Currently, each time to install a snapshot for OM and SCM is to get a 
> checkpoint of RDB and send it to the follower. As the data stored in RDB 
> increases, the very long transmission time of the whole checkpoint could be a 
> large cost, which could cause the follower to install the snapshot repeatedly 
> if it finds out the leader has already truncated the new raft logs and needs 
> to install a new snapshot.
> Given an example in the test(OM), the raft log index is 570767469, it takes 
> around 13 minutes for the follower to install the snapshot. As ozone is 
> designed to overcome the shortage of in-memory metadata, it should have the 
> ability to preserve much more data than a hundred million level.  Once the OM 
> has reached that level, each time to install snapshot would be a big problem. 
> There will be only two raft peers working (if we set up 3-node HA) and that 
> condition is fragile.
> Another statics: For 16 hundred million keys, the size of om.db directory is 
> 45GB. Around 2.8 hundred million keys/GB. This is tested through createKey 
> api.
> To solve the problem, we should have Incremental Checkpointing. This could 
> provide another slight increment instead of the whole RDB checkpoint and thus 
> reduce the time of transmission. I recommend referring to the implementation 
> in FLINK, but we need to store the diff of checkpoints locally instead of 
> another storage system.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-6510) Incremental Checkpointing Support

Reply via email to