Re: [PR] HBASE-28643 An unbounded backup failure message can cause an irrecoverable state for the given backup [hbase]

via GitHub Fri, 19 Jul 2024 08:56:44 -0700


rmdmattingly commented on code in PR #6088:
URL: https://github.com/apache/hbase/pull/6088#discussion_r1684576461



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java:
##########
@@ -293,7 +293,19 @@ public void updateBackupInfo(BackupInfo info) throws 
IOException {
     }
     try (Table table = connection.getTable(tableName)) {
       Put put = createPutForBackupInfo(info);
-      table.put(put);
+      try {
+        table.put(put);
+      } catch (Exception e) {
+        // If the BackupInfo update can't be processed, then we should fall 
back to
+        // the previous BackupInfo, but also update it to reflect the failure.
+        LOG.error("Failed to update BackupInfo for {}. Marking as failed", 
info.getBackupId(), e);
+        BackupInfo legacyInfo = readBackupInfo(info.getBackupId());
+        if (legacyInfo != null) {
+          legacyInfo.setFailedMsg("Failed to update BackupInfo. Error: " + 
e.getMessage());
+          table.put(createPutForBackupInfo(legacyInfo));

Review Comment:
   That's all correct. FWIW, we do have other means for identifying backup 
failures — without a success marker file I believe this backup could not be 
used to restore a table. The problem is that we have multiple sources of truth 
for whether a backup has succeeded — the marker file and entries in the system 
table — and if they become out of sync then things get weird. Missing updates 
in the system table cause failures that are only really salvageable by purging 
the table and/or taking a new full backup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28643 An unbounded backup failure message can cause an irrecoverable state for the given backup [hbase]

Reply via email to