Re: [PR] HBASE-28643 An unbounded backup failure message can cause an irrecoverable state for the given backup [hbase]

via GitHub Wed, 24 Jul 2024 08:33:18 -0700


DieterDP-ng commented on code in PR #6088:
URL: https://github.com/apache/hbase/pull/6088#discussion_r1690024086



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java:
##########
@@ -293,7 +293,19 @@ public void updateBackupInfo(BackupInfo info) throws 
IOException {
     }
     try (Table table = connection.getTable(tableName)) {
       Put put = createPutForBackupInfo(info);
-      table.put(put);
+      try {
+        table.put(put);
+      } catch (Exception e) {
+        // If the BackupInfo update can't be processed, then we should fall 
back to
+        // the previous BackupInfo, but also update it to reflect the failure.
+        LOG.error("Failed to update BackupInfo for {}. Marking as failed", 
info.getBackupId(), e);
+        BackupInfo legacyInfo = readBackupInfo(info.getBackupId());
+        if (legacyInfo != null) {
+          legacyInfo.setFailedMsg("Failed to update BackupInfo. Error: " + 
e.getMessage());
+          table.put(createPutForBackupInfo(legacyInfo));

Review Comment:
   I don't think adding this logic here will have any noticable effect. Since 
an Exception is thrown in this method, calling code will most likely end up in 
`TableBackupClient#failBackup`, which will overwrite the failure message (with 
the same message: the one from the Exception).
   
   So in the interest of keeping things simple, I'm pro reverting this specific 
change. The limiting of the error message length should already ensure a proper 
error gets stored.
   
   I wouldn't oppose some kind of retry mechanism for failed PUTs, but I'd 
expect that connection issues are automatically retried when executing PUTs, 
and a content-based PUT failure will still fail when retried.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28643 An unbounded backup failure message can cause an irrecoverable state for the given backup [hbase]

Reply via email to