Re: [PR] HBASE-28562 Correct backup ancestor calculation [hbase]

via GitHub Mon, 06 May 2024 01:22:12 -0700


DieterDP-ng commented on code in PR #5868:
URL: https://github.com/apache/hbase/pull/5868#discussion_r1590689156



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java:
##########
@@ -295,51 +298,31 @@ public ArrayList<BackupImage> getAncestors(BackupInfo 
backupInfo) throws IOExcep
         
.withRootDir(backup.getBackupRootDir()).withTableList(backup.getTableNames())
         
.withStartTime(backup.getStartTs()).withCompleteTime(backup.getCompleteTs()).build();
 
-      // Only direct ancestors for a backup are required and not entire 
history of backup for this
-      // table resulting in verifying all of the previous backups which is 
unnecessary and backup
-      // paths need not be valid beyond the lifetime of a backup.
-      //
-      // RootDir is way of grouping a single backup including one full and 
many incremental backups
+      // If the image has a different rootDir, it cannot be an ancestor.
       if (!image.getRootDir().equals(backupInfo.getBackupRootDir())) {
         continue;
       }
 
-      // add the full backup image as an ancestor until the last incremental 
backup
-      if (backup.getType().equals(BackupType.FULL)) {
-        // check the backup image coverage, if previous image could be covered 
by the newer ones,
-        // then no need to add
-        if (!BackupManifest.canCoverImage(ancestors, image)) {
-          ancestors.add(image);
-        }
+      // The ancestors consist of the most recent FULL backups that cover the 
list of tables
+      // required in the new backup and all INCREMENTAL backups that came 
after one of those FULL
+      // backups.
+      if (backup.getType().equals(BackupType.INCREMENTAL)) {

Review Comment:
   I thought this was a consequence of the incremental backups using the WAL to 
track changes. Incremental backups are created using the WAL-records since the 
previous full backup. So I figured that any incremental backup would fully 
consume that historic WAL so disk space is reclaimed. Assuming the WAL would be 
cleaned up after creating the incremental backup, that means you can't have 
multiple incremental backups spanning the same time period, hence it would make 
sense to include all tables in the incremental backup that create.
   
   Some experiments last week (as result of your other feedback), learned me 
that using different root folders, it is possible to have incremental backups 
spanning the same time period. So perhaps I was wrong about my other idea as 
well.
   
   I'll also browse some source code to find a lead on this, but isn't there a 
technical design written somewhere about how backup-restore is supposed to work?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] HBASE-28562 Correct backup ancestor calculation [hbase]

Reply via email to