DieterDP-ng commented on code in PR #6040:
URL: https://github.com/apache/hbase/pull/6040#discussion_r1761684621


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java:
##########
@@ -80,25 +80,32 @@ public void init(Map<String, Object> params) {
     }
   }
 
-  private Map<Address, Long> getServersToOldestBackupMapping(List<BackupInfo> 
backups)
+  private Map<Address, Long> 
serversToLatestUnusedLogTimestamp(List<BackupInfo> backups)
     throws IOException {
-    Map<Address, Long> serverAddressToLastBackupMap = new HashMap<>();
+    Map<Address, Long> serversToLatestUnusedLogTimestamp = new HashMap<>();
 
-    Map<TableName, Long> tableNameBackupInfoMap = new HashMap<>();
-    for (BackupInfo backupInfo : backups) {
+    Map<String, BackupInfo> mostRecentBackupPerRootDir = new HashMap<>();
+    for (BackupInfo backup : backups) {
+      BackupInfo existingEntry = 
mostRecentBackupPerRootDir.get(backup.getBackupRootDir());
+      if (existingEntry == null || existingEntry.getStartTs() < 
backup.getStartTs()) {
+        mostRecentBackupPerRootDir.put(backup.getBackupRootDir(), backup);
+      }
+    }
+
+    for (BackupInfo backupInfo : mostRecentBackupPerRootDir.values()) {
       for (TableName table : backupInfo.getTables()) {
-        tableNameBackupInfoMap.putIfAbsent(table, backupInfo.getStartTs());
-        if (tableNameBackupInfoMap.get(table) <= backupInfo.getStartTs()) {
-          tableNameBackupInfoMap.put(table, backupInfo.getStartTs());
-          for (Map.Entry<String, Long> entry : 
backupInfo.getTableSetTimestampMap().get(table)
-            .entrySet()) {
-            
serverAddressToLastBackupMap.put(Address.fromString(entry.getKey()), 
entry.getValue());
+        for (Map.Entry<String, Long> entry : 
backupInfo.getTableSetTimestampMap().get(table)
+          .entrySet()) {
+          Address address = Address.fromString(entry.getKey());
+          Long existingValue = serversToLatestUnusedLogTimestamp.get(address);
+          if (existingValue == null || existingValue > entry.getValue()) {
+            serversToLatestUnusedLogTimestamp.put(address, entry.getValue());
           }

Review Comment:
   Re-reading this code after 2 months, I realize it deserves some additional 
documentation.
   
   But I think the comparison is good as-is:
   - In the first loop (line 88), we track the newest/mostRecent backup per root
   - In the second loop (line 95), we check the lowest timestamp (= 
oldest/leastRecent) per address, using the backup per root
   
   So we effectively check per address which timestamp is included in all 
backups. Only wals older (= lower TS) than that boundary can be deleted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to