jmsperu commented on PR #12843: URL: https://github.com/apache/cloudstack/pull/12843#issuecomment-4522952794
Rebased. Force-pushed 0deb4c4 and changed the base from `4.20` → `main`. What landed in main since this PR was opened (via other PRs): - `QUIESCE` arg + freeze/thaw - `EXIT_CLEANUP_FAILED` exit code - Basic `cleanup()` function (called explicitly at 6 sites) So this PR is now reduced to just the three improvements those didn't cover: 1. **`BACKUP_TIMEOUT` env var** (default 6h) bounds the `domjobinfo` wait loop in `backup_running_vm` so a stuck QEMU backup hits `domjobabort` and frees the agent's command slot rather than holding it open until the orchestrator-level timeout. 2. **`MIN_FREE_SPACE` env var** (default 1 GiB) + `check_free_space()` runs after mount and before any `qemu-img convert` in both backup paths — fail-fast on a near-full NAS instead of failing mid-write. 3. **`trap cleanup EXIT`** replaces the explicit calls as the primary cleanup mechanism so orphan NFS mounts no longer accumulate on SIGTERM/SIGINT or set -e failures between the explicit call sites. `cleanup()` is guarded by a `CLEANUP_DONE` flag so the trap doesn't re-run an already-completed cleanup from an explicit call. `cleanup()` also now resumes the VM if it's still paused — a backup that dies mid-pause currently leaves the guest stuck `paused` until an operator intervenes. Total: 71 +/3 - on one file. Ready for another look @abh1sar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
