GeorgeJahad opened a new pull request, #3980: URL: https://github.com/apache/ozone/pull/3980
## What changes were proposed in this pull request? This PR updates the om follower bootstrap mechanism to include the snapshot state. It considers snapshot state to be all files under *metadataDir/db.snapshots*. That includes the om snapshot directories, as well as the snapshot diff compaction logs and backup sst files, (which have been moved to the "db.snapshots" dir by this PR.) This PR adds the contents db.snapshots dir to the tarball sent to the follower. To reduce the size of the tarball, it does not include multiple copies of any hard links found. Instead, it includes a list of hard links to be generated by the follower. Design doc here: https://docs.google.com/document/d/1cFZj-7NRxiHaZ56ndcf1Z1EqapPFy4fo4dDVIy_aCx4/edit ### Recon Recon also uses the same tarball to initialize its copy of the OM rocksdb. Since it doesn't need the snapshot data, I've added the "includeSnapshotData" parameter to the http request. ### Renamed OzoneManagerSnapshotProvider The ratis code uses the term "snapshot" to mean something other than what we mean. It uses "snapshot" to refer to the tarball as a whole, (which now includes all of the individual "OM snapshots".) In particular, this class, in the "om/snapshot" directory is ambiguously named: ``` org/apache/hadoop/ozone/om/snapshot/OzoneManagerSnapshotProvider.java ``` To reduce potential confusion, I've renamed it to: ``` org/apache/hadoop/ozone/om/ratis_snapshot/OmRatisSnapshotProvider.java ``` ### Internal Consistency of Tarball There are two areas of consistency I've thought about: ##### Snapshot Info Table Entries -> Snapshot Directories There needs to be a directory for each snapshot info table entry. These directories sometimes appear a short while after the snapshot info table entry is created. This PR addresses that by ensuring the directories exist before creating the tarball, (pausing for a few seconds if needed.) ##### Compaction Logs -> SST Files If the tarball is created during compaction, the snap diff compaction logs for the most recent compaction may not be included. I'm not sure how bad a problem this is. Please consider it in your review of this PR. ### Incremental checkpointing The addition of snapshot data will increase the size of the tarball, exacerbating the problem described here: https://issues.apache.org/jira/browse/HDDS-6510 We'll need to decide if incremental checkpointing needs to a part of the initial snapshot release. If not, it may need to come soon afterwards, otherwise users could be stranded. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-6961 ## How was this patch tested? Unit/integration tests added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
