jmsperu commented on PR #12843:
URL: https://github.com/apache/cloudstack/pull/12843#issuecomment-4522952794

   Rebased. Force-pushed 0deb4c4 and changed the base from `4.20` → `main`.
   
   What landed in main since this PR was opened (via other PRs):
   - `QUIESCE` arg + freeze/thaw
   - `EXIT_CLEANUP_FAILED` exit code  
   - Basic `cleanup()` function (called explicitly at 6 sites)
   
   So this PR is now reduced to just the three improvements those didn't cover:
   
   1. **`BACKUP_TIMEOUT` env var** (default 6h) bounds the `domjobinfo` wait 
loop in `backup_running_vm` so a stuck QEMU backup hits `domjobabort` and frees 
the agent's command slot rather than holding it open until the 
orchestrator-level timeout.
   
   2. **`MIN_FREE_SPACE` env var** (default 1 GiB) + `check_free_space()` runs 
after mount and before any `qemu-img convert` in both backup paths — fail-fast 
on a near-full NAS instead of failing mid-write.
   
   3. **`trap cleanup EXIT`** replaces the explicit calls as the primary 
cleanup mechanism so orphan NFS mounts no longer accumulate on SIGTERM/SIGINT 
or set -e failures between the explicit call sites. `cleanup()` is guarded by a 
`CLEANUP_DONE` flag so the trap doesn't re-run an already-completed cleanup 
from an explicit call. `cleanup()` also now resumes the VM if it's still paused 
— a backup that dies mid-pause currently leaves the guest stuck `paused` until 
an operator intervenes.
   
   Total: 71 +/3 - on one file. Ready for another look @abh1sar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to