hgromer commented on code in PR #5408:
URL: https://github.com/apache/hbase/pull/5408#discussion_r2362698949
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupUtils.java:
##########
@@ -770,4 +773,52 @@ public static String findMostRecentBackupId(String[]
backupIds) {
return BackupRestoreConstants.BACKUPID_PREFIX + recentTimestamp;
}
+ /**
+ * roll WAL writer for all region servers and record the newest log roll
result
+ */
+ public static void logRoll(Connection conn, String backupRootDir,
Configuration conf)
+ throws IOException {
+ boolean legacy = conf.getBoolean("hbase.backup.logroll.legacy.used",
false);
+ if (legacy) {
+ logRollV1(conn, backupRootDir);
+ } else {
+ logRollV2(conn, backupRootDir);
+ }
+ }
+
+ private static void logRollV1(Connection conn, String backupRootDir) throws
IOException {
+ try (Admin admin = conn.getAdmin()) {
+
admin.execProcedure(LogRollMasterProcedureManager.ROLLLOG_PROCEDURE_SIGNATURE,
+ LogRollMasterProcedureManager.ROLLLOG_PROCEDURE_NAME,
+ ImmutableMap.of("backupRoot", backupRootDir));
+ }
+ }
+
+ private static void logRollV2(Connection conn, String backupRootDir) throws
IOException {
Review Comment:
> and I don't think any additional functionality should be introduced or
existing functionality should be reduced.
I don't necessarily agree here, though I do agree v1 isn't doing this
cleanup either. To clarify, this PR doesn't introduce any issues, but thanks to
your changes I think we have a good way to solve the issue I've presented. This
is a beta feature, and I feel that we can iterate on the behavior of the
system, esepcially if it means we're improving the system's efficiency by
reducing storage overhead.
1. Yes; the oldWALs can take up a non-trivial amount of space and should be
cleaned up when they are no longer necessary. We're seeing cases where we are
storing terabytes of unused, deletable data, which is expensive
2. I'd like to talk this out, and make sure my logic makes sense. There are
two types of backups. For full backups, it makes sense that we don't lose any
data. We roll the WAL files, and then take a snapshot of all the HFiles on the
cluster, so those WAL files are backed up. For incremental backups, we roll the
WAL files then backup all WAL files from [<old_start_code>, newTimestamps). For
both cases, I do not think there's any possibility of a data loss. I think as
long as we delete entries from the system table after the backup completes, we
should be okay
3. It'd be nice, but I think it'd be a lot harder b/c logRollV1 never
updates the backup system table with the newer timestamps as far as I can tell.
Given this feature is still in beta, I'm happy to move forward with the v2
functionality and mark v1 as deprecated
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]