rmdmattingly commented on code in PR #6088:
URL: https://github.com/apache/hbase/pull/6088#discussion_r1684576461
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java:
##########
@@ -293,7 +293,19 @@ public void updateBackupInfo(BackupInfo info) throws
IOException {
}
try (Table table = connection.getTable(tableName)) {
Put put = createPutForBackupInfo(info);
- table.put(put);
+ try {
+ table.put(put);
+ } catch (Exception e) {
+ // If the BackupInfo update can't be processed, then we should fall
back to
+ // the previous BackupInfo, but also update it to reflect the failure.
+ LOG.error("Failed to update BackupInfo for {}. Marking as failed",
info.getBackupId(), e);
+ BackupInfo legacyInfo = readBackupInfo(info.getBackupId());
+ if (legacyInfo != null) {
+ legacyInfo.setFailedMsg("Failed to update BackupInfo. Error: " +
e.getMessage());
+ table.put(createPutForBackupInfo(legacyInfo));
Review Comment:
That's all correct. FWIW, we do have other means for identifying backup
failures — without a success marker file I believe this backup could not be
used to restore a table. The problem is that we have multiple sources of truth
for whether a backup has succeeded — the marker file and entries in the system
table — and if they become out of sync then things get weird. Missing updates
in the system table cause failures that are only really salvageable by purging
the table and/or taking a new full backup
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]