daviftorres commented on issue #11727:
URL: https://github.com/apache/cloudstack/issues/11727#issuecomment-4721183915

   > > Looking at the code 
(https://github.com/apache/cloudstack/blob/f63118c011ecd90a81adee5fd043a966b4822c47/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java)
 I see the error handlers:
   > > 
   > > * Line 223 sets status to `Failed` when connection fails,
   > > * Line 228 sets status to `Failed` when times out,
   > > * Line 251 sets status to `Failed` when fails for any other reason.
   > > 
   > > Note that in line 247 sets status to `Error` when the cleanup fails, and 
that is the only time it updates and leave the backup with a failing statatus 
behind. Al the other scenarios it removed right after setting status.
   > 
   > Hi [@daviftorres](https://github.com/daviftorres) , The idea with setting 
error on line 247 and not deleting the Backup entry is that in this case after 
some error, the backend wasn't able to cleanup the backup files in the storage. 
So, the user has to cleanup those files manually and then delete the backup.
   > 
   > In case of other errors, there are no backup files created on the storage, 
so the Backup can be deleted by the system after returning an appropriate 
error. If we leave it, that is just manual overhead for the user to delete. The 
User will see the error in the UI and it is being logged as well. I believe 
volume snapshots have the same behaviour.
   > 
   > Setting the status as Failed just before removing doesn't solve much, but 
I guess it is still useful for forensics when someone looking at the backups 
table in the db.
   > 
   > Usually, CloudStack doesn't create Alerts for failure events, they are 
mostly for monitoring capacities such as storage space etc. But Event for 
backup failures at least for scheduled backups could be a good idea. I don't 
see such events for Scheduled snapshots as well, so may be there could be a 
generic solution for such events.
   > 
   > Hope that clarifies it. Happy to discuss further.
   > 
   > About the Issue of Backups being stuck in the BackingUp state, which 
version are you running? There was a bug in the nasbackup.sh script, due to 
which, if there was some connection error or mount error, Backup job could be 
stuck indefinitely or for a very long time. I fixed it in 4.21. I haven't seen 
backup jobs stuck in my testing so far in 4.21 and 4.22
   > 
   > 
[79f83db](https://github.com/apache/cloudstack/commit/79f83dbbbd9698140a76ef6d93590c9077e34fb2)
   
   Here is my take on this and why I believe this issue should be revisited:
   
   - "the user has to cleanup those files manually and then delete the backup"
     - I strongly believe users would not mind cleaning up failed backups in 
ACS, as this gives them awareness that backups are failing and a chance to fix 
the root cause.
   
   - "there are no backup files created on the storage"
     - This is only partially true. I have confirmed that in many cases the 
files are there. I even successfully restored backups by copying the leftover 
files to another hypervisor.
   
   Combining both points: by leaving the failed backup entry visible to the 
user, they gain a meaningful indication of the failure. When they manually 
delete it, ACS can properly remove any orphaned files left in storage.
   
   See this real-world example where multiple backups failed last night. This 
type of information is extremely valuable to make visible to users.
   
   <img width="811" height="723" alt="Image" 
src="https://github.com/user-attachments/assets/d6a89843-e10c-40c8-bd8a-e521ca32f59c";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to