DieterDP-ng commented on code in PR #7582:
URL: https://github.com/apache/hbase/pull/7582#discussion_r2708626185
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalBackupManager.java:
##########
@@ -228,15 +263,6 @@ private List<String> getLogFilesForNewBackup(Map<String,
Long> olderTimestamps,
} else if (currentLogTS > oldTimeStamp) {
resultLogFiles.add(currentLogFile);
}
-
- // It is possible that a host in .oldlogs is an obsolete region server
Review Comment:
> 1. In isolation, without the other changes to this PR, it means we'll
never include the WAL files for a host that was removed for the cluster. Keep
in mind, currentLogFile is an oldLog. So backup happens, new wals get created
and moved to oldWAL dir, server goes offline, the boundary for that server is
never updated, those WAL files will never be included in future backups and we
lose data
Agreed. This is the case where `newTimestamp == null`, and I agree this case
resulted in data loss in the original code, and should be fixed by not using
`if (newTimestamp == null || currentLogTS > newTimestamp)` as-is.
> 2. In the context of this PR, we update the boundary for each host
based on the latest WAL that was included in the backup for the host, so this
logic actually completely irrelevant b/c newTimestamp will never be null
Again, agreed, for the case where `newTimestamp == null`. Conclusion: `if
(newTimestamp == null || currentLogTS > newTimestamp)` is incorrect as-is.
However, my remark is about the case where `newTimestamp != null &&
currentLogTS > newTimestamp`.
It's probably an extremely unlikely scenario, but imagine the following in a
system where some HDFS operations take longer than normal (in the context of
this PR):
- A incremental backup is triggered, a log roll is triggered as a result,
RegionServer X creates logfile L1.
- Right after the log roll was triggered, but before
`getLogFilesForNewBackup` lists all the oldLogs, RegionServer X receives some
data and goes offline again, causing an oldLog L2 for X to be written to
storage.
- `newTimestamps` will contain an entry `X: timestamp(L1)`
- When `getLogFilesForNewBackup` lists the oldLogs, it will see L2, which is
not added to `newestLogs` (because this PR removes the entire code block),
hence L2 will be included in the backup.
My remark is that if, instead of deleting the check, we change it to
```
if (newTimestamp != null && currentLogTS > newTimestamp) {
newestLogs.add(currentLogFile);
}
```
then L2 would not be included in the current backup (as I would expect), but
it would end up in the following one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]