DieterDP-ng commented on code in PR #7582:
URL: https://github.com/apache/hbase/pull/7582#discussion_r2708626185


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -228,15 +263,6 @@ private List<String> getLogFilesForNewBackup(Map<String, 
Long> olderTimestamps,
       } else if (currentLogTS > oldTimeStamp) {
         resultLogFiles.add(currentLogFile);
       }
-
-      // It is possible that a host in .oldlogs is an obsolete region server

Review Comment:
   >     1. In isolation, without the other changes to this PR, it means we'll 
never include the WAL files for a host that was removed for the cluster. Keep 
in mind, currentLogFile is an oldLog. So backup happens, new wals get created 
and moved to oldWAL dir, server goes offline, the boundary for that server is 
never updated, those WAL files will never be included in future backups and we 
lose data
   
   Agreed. This is the case where `newTimestamp == null`, and I agree this case 
resulted in data loss in the original code, and should be fixed by not using 
`if (newTimestamp == null || currentLogTS > newTimestamp)` as-is.
   
   >     2. In the context of this PR, we update the boundary for each host 
based on the latest WAL that was included in the backup for the host, so this 
logic actually completely irrelevant b/c newTimestamp will never be null
   
   Again, agreed, for the case where `newTimestamp == null`. Conclusion: `if 
(newTimestamp == null || currentLogTS > newTimestamp)` is incorrect as-is.
   
   However, my remark is about the case where `newTimestamp != null && 
currentLogTS > newTimestamp`.
   It's probably an extremely unlikely scenario, but imagine the following in a 
system where some HDFS operations take longer than normal (in the context of 
this PR):
   
   - A incremental backup is triggered, a log roll is triggered as a result, 
RegionServer X creates logfile L1.
   - Right after the log roll was triggered, but before 
`getLogFilesForNewBackup` lists all the oldLogs, RegionServer X receives some 
data and goes offline again, causing an oldLog L2 for X to be written to 
storage.
   - `newTimestamps` will contain an entry `X: timestamp(L1)`
   - When `getLogFilesForNewBackup` lists the oldLogs, it will see L2, which is 
not added to `newestLogs` (because this PR removes the entire code block), 
hence L2 will be included in the backup.
   
   My remark is that if, instead of deleting the check, we change it to 
   ```
   if (newTimestamp != null && currentLogTS > newTimestamp) {
     newestLogs.add(currentLogFile);
   }
   ```
   then L2 would not be included in the current backup (as I would expect), but 
it would end up in the following one.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to