Re: [PR] HBASE-28562 Correct backup ancestor calculation [hbase]

via GitHub Mon, 06 May 2024 01:59:12 -0700


DieterDP-ng commented on code in PR #5868:
URL: https://github.com/apache/hbase/pull/5868#discussion_r1590726707



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java:
##########
@@ -295,51 +296,32 @@ public ArrayList<BackupImage> getAncestors(BackupInfo 
backupInfo) throws IOExcep
         
.withRootDir(backup.getBackupRootDir()).withTableList(backup.getTableNames())
         
.withStartTime(backup.getStartTs()).withCompleteTime(backup.getCompleteTs()).build();
 
-      // Only direct ancestors for a backup are required and not entire 
history of backup for this
-      // table resulting in verifying all of the previous backups which is 
unnecessary and backup
-      // paths need not be valid beyond the lifetime of a backup.
-      //
-      // RootDir is way of grouping a single backup including one full and 
many incremental backups
+      // If the image has a different rootDir, it cannot be an ancestor.
       if (!image.getRootDir().equals(backupInfo.getBackupRootDir())) {
         continue;
       }
 
-      // add the full backup image as an ancestor until the last incremental 
backup
-      if (backup.getType().equals(BackupType.FULL)) {
-        // check the backup image coverage, if previous image could be covered 
by the newer ones,
-        // then no need to add
-        if (!BackupManifest.canCoverImage(ancestors, image)) {
-          ancestors.add(image);
-        }
+      // The ancestors consist of the most recent FULL backups that cover the 
list of tables
+      // required in the new backup and all INCREMENTAL backups that came 
after one of those FULL
+      // backups.

Review Comment:
   Indeed, the current implementation assumes that you can only ask the 
ancestors for a backup that you are newly creating. I think that was also the 
intention in the original implementation.
   I based this on the mentioning of  `for the current backup` in the javadoc 
and logging of the method, the current usages of the method (all related to a 
newly created backup), and this comment in the original code:
   
   > Otherwise, this incremental backup ancestor is the dependent ancestor of 
the ongoing incremental backup
   
   I agree this functionality would be useful for historic backups as well, but 
it's unexpected that this would only work for backups still present in the 
backup table (e.g. it would not work for a new HBase instance that wants to 
restore files from cloud storage). For retrieving the ancestors of historic 
backups, it makes more sense to read the manifest instead, no?
   
   I suggest clarifying the javadoc to highlight this is only for 
currently-being-created backups.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28562 Correct backup ancestor calculation [hbase]

Reply via email to