DieterDP-ng commented on code in PR #6040:
URL: https://github.com/apache/hbase/pull/6040#discussion_r1768863626
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########
@@ -81,39 +81,55 @@ public void init(Map<String, Object> params) {
}
}
- private Map<Address, Long> getServerToNewestBackupTs(List<BackupInfo>
backups)
+ /**
+ * Calculates the timestamp boundary up to which all backup roots have
already included the WAL.
+ * I.e. WALs with a lower (= older) or equal timestamp are no longer needed
for future incremental
+ * backups.
+ */
+ private Map<Address, Long> serverToPreservationBoundaryTs(List<BackupInfo>
backups)
Review Comment:
> newestBackupPerRootDir will contain: (R1: B4)
Agreed
> Since the newest backup in R1 is B4, serverToPreservationBoundaryTs will
contain (S1: 20, S2: 20, S3: 20)
I think this is wrong.
To make things more concrete, I'm going to assume all backups were full
backups, and B1 == B2 (since it has the same timestamp). I.e. the backups were:
- B1: at timestamp 0, containing tables T1 & T2
- B3: at timestamp 10, containing T1
- B4: at timestamp 20, containing T2
At this point, the data in `backupInfo.getTableSetTimestampMap()` will be:
```
T1:
S1: 10
S2: 10
T2:
S2: 20
S3: 20
```
and `serverToPreservationBoundaryTs` will be (S1: 10, S2: 10, S3: 20).
The reason that the BackupInfo of B4 contains the log timestamps of tables
not included in B4 is due to how the backup client updates these:
```
// From: FullTableBackupClient
// Updates only the rows for the tables included in B4
backupManager.writeRegionServerLogTimestamp(backupInfo.getTables(),
newTimestamps);
// Reads the rows for all tables once included in a backup in this backup
root
Map<TableName, Map<String, Long>> newTableSetTimestampMap =
backupManager.readLogTimestampMap();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]