ndimiduk commented on code in PR #6088:
URL: https://github.com/apache/hbase/pull/6088#discussion_r1684179640


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java:
##########
@@ -293,7 +293,19 @@ public void updateBackupInfo(BackupInfo info) throws 
IOException {
     }
     try (Table table = connection.getTable(tableName)) {
       Put put = createPutForBackupInfo(info);
-      table.put(put);
+      try {
+        table.put(put);
+      } catch (Exception e) {
+        // If the BackupInfo update can't be processed, then we should fall 
back to
+        // the previous BackupInfo, but also update it to reflect the failure.
+        LOG.error("Failed to update BackupInfo for {}. Marking as failed", 
info.getBackupId(), e);
+        BackupInfo legacyInfo = readBackupInfo(info.getBackupId());
+        if (legacyInfo != null) {
+          legacyInfo.setFailedMsg("Failed to update BackupInfo. Error: " + 
e.getMessage());
+          table.put(createPutForBackupInfo(legacyInfo));

Review Comment:
   I see that you're calling set**Failed**Msg, which i guess implies setting a 
failed status? I missed that detail on my first review.
   
   What are the odds that a second Put call will succeed where the first 
failed? If we expect that a second attempt at the Put might succeed, we could 
instead wrap sending the original Put in a retry loop... though I suppose the 
client itself will make some retry attempts.
   
   Do we have no other means of marking the backup as having failed? Because 
the result of a failure here is dataloss over subsequent backups, perhaps we 
need a more robust system for identifying failures, to be cautious at several 
logical checkpoints. For example, maybe we can detect the absence of some value 
to recongize a failure? @DieterDP-ng, @hgromer : I'd love to hear your thoughts 
here as well.
   
   Anyway, this thread of thought is expanding beyond the scope of the ticket 
you've filed. If you want to just push the truncation change, or the truncation 
change plus a retry loop on the original Put, we could leave the rest for a 
larger issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to