rmdmattingly commented on code in PR #6847:
URL: https://github.com/apache/hbase/pull/6847#discussion_r2014305469
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:
##########
@@ -853,6 +869,188 @@ protected void printUsage() {
}
}
+ /**
+ * The {@code CleanupCommand} class is responsible for removing Write-Ahead
Log (WAL) and
+ * bulk-loaded files that are no longer needed for Point-in-Time Recovery
(PITR).
+ * <p>
+ * The cleanup process follows these steps:
+ * <ol>
+ * <li>Identify the oldest full backup and its start timestamp.</li>
+ * <li>Delete WAL files older than this timestamp, as they are no longer
usable for PITR with any
+ * backup.</li>
+ * </ol>
Review Comment:
Ah okay, thanks for the clarifications here. Maybe we could bake this
clarification into the JavaDocs
You make some good points here, but I don't think they full take into
account the variety of ways in which people deploy HBase.
> All other backup-related commands are currently manual.
This is true to a large extent, but the non-emergency commands have at least
been exposed in the Admin interface to make programmatic backups easily
achievable. Maybe wiring up through the Admin is a fair compromise?
> This command depends on the delete command. What we are doing here is
identifying the first full backup in the system and deleting all WALs before
that point.
If this operation can only follow a delete, and WALs are made useless by
said delete, then should this operation just be a part of the backup deletion
process?
> Since deleting full backups is already a manual operation, there is little
benefit in automating this cleanup.
I don't think it's true that backup deletions are necessarily manual from an
operator's perspective. For example, a company backing up their data in S3
could be making use of bucket TTLs to clean up their old backups. In that case,
it would be nice for unusable WALs to clean themselves up organically too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]