rmdmattingly commented on code in PR #6847:
URL: https://github.com/apache/hbase/pull/6847#discussion_r2013925744


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupDriver.java:
##########
@@ -120,6 +120,8 @@ private int parseAndRun(String[] args) throws IOException {
       type = BackupCommand.REPAIR;
     } else if (BackupCommand.MERGE.name().equalsIgnoreCase(cmd)) {
       type = BackupCommand.MERGE;
+    } else if (BackupCommand.CLEANUP.name().equalsIgnoreCase(cmd)) {
+      type = BackupCommand.CLEANUP;
     } else {

Review Comment:
   I would name this something more specific, unless this command intends to 
clean up entries that may be left behind for full and incremental backups as 
well



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:
##########
@@ -853,6 +869,188 @@ protected void printUsage() {
     }
   }
 
+  /**
+   * The {@code CleanupCommand} class is responsible for removing Write-Ahead 
Log (WAL) and
+   * bulk-loaded files that are no longer needed for Point-in-Time Recovery 
(PITR).
+   * <p>
+   * The cleanup process follows these steps:
+   * <ol>
+   * <li>Identify the oldest full backup and its start timestamp.</li>
+   * <li>Delete WAL files older than this timestamp, as they are no longer 
usable for PITR with any
+   * backup.</li>
+   * </ol>

Review Comment:
   The standard approach in HBase is to delete old files via extensions of the 
[`BaseFileCleanerDelegate`](https://github.com/apache/hbase/blob/9c8c9e7fbf8005ea89fa9b13d6d063b9f0240443/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/BaseFileCleanerDelegate.java#L32).
 For example, the 
[`BackupLogCleaner`](http://github.com/apache/hbase/blob/c477901dec8fbe4b6a745065ba65fd6808746ebf/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L57)
 which already handles cleaning up WALs as they relate to backups.
   
   These cleaners should be run by the HMaster's 
[`CleanerChore`](https://github.com/apache/hbase/blob/1ddb5bb43cfe4f543710a84884a5df20d02ff0a8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java#L54),
 which will ensure that we only delete files which live in the intersection of 
_all_ cleaners' outputs. On top of that critical safety guarantee, this also 
has the advantage of being run periodically, automatically — for more 
sophisticated HBase operators, this is a critical advantage because manual 
operations for textbook operations like backups do not scale well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to