Copilot commented on code in PR #12843:
URL: https://github.com/apache/cloudstack/pull/12843#discussion_r3306134516


##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -169,7 +187,13 @@ backup_running_vm() {
         echo "Virsh backup job failed"
         cleanup ;;

Review Comment:
   In the domjobinfo polling loop, the `Failed)` branch calls `cleanup` but 
does not exit the loop/function. With `BACKUP_TIMEOUT` defaulting to 6 hours, a 
job that fails quickly will keep the agent occupied until the timeout triggers, 
rather than failing fast.



##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -262,19 +287,62 @@ mount_operation() {
   fi
 }
 
+check_free_space() {
+  local free_bytes
+  free_bytes=$(df -P "$mount_point" 2>/dev/null | awk 'NR==2 {print $4}')
+  if [[ -n "$free_bytes" ]]; then
+    # df -P reports 1K blocks; convert to bytes.
+    free_bytes=$((free_bytes * 1024))
+    if [[ $free_bytes -lt $MIN_FREE_SPACE ]]; then
+      echo "Insufficient free space on backup target: $((free_bytes / 
1048576)) MB available, $((MIN_FREE_SPACE / 1048576)) MB required"
+      exit 1
+    fi
+    log -ne "Backup target has $((free_bytes / 1073741824)) GB free space"
+  fi
+}
+
 cleanup() {
+  # Idempotent: skip if a prior explicit call already ran. Without this guard,
+  # the EXIT trap would re-run cleanup and fail on the already-unmounted point.
+  [[ $CLEANUP_DONE -eq 1 ]] && return 0
+  CLEANUP_DONE=1
+
   local status=0
 
-  rm -rf "$dest" || { echo "Failed to delete $dest"; status=1; }
-  umount "$mount_point" || { echo "Failed to unmount $mount_point"; status=1; }
-  rmdir "$mount_point" || { echo "Failed to remove mount point $mount_point"; 
status=1; }
+  # If the VM was paused mid-backup (e.g. backup-begin succeeded but the script
+  # is exiting on error or signal), resume it. Without this a failed backup
+  # leaves the guest stuck in 'paused' state until an operator intervenes.
+  if [[ -n "$VM" ]]; then
+    local vm_state
+    vm_state=$(virsh -c qemu:///system domstate "$VM" 2>/dev/null || true)
+    if [[ "$vm_state" == "paused" ]]; then
+      log -ne "Resuming paused VM $VM during backup cleanup"
+      if ! virsh -c qemu:///system resume "$VM" > /dev/null 2>&1; then
+        echo "Failed to resume VM $VM"
+        status=1
+      fi
+    fi
+  fi
+
+  if [[ -n "$dest" && -d "$dest" ]]; then
+    rm -rf "$dest" || { echo "Failed to delete $dest"; status=1; }
+  fi
+  if [[ -n "$mount_point" && -d "$mount_point" ]]; then
+    umount "$mount_point" 2>/dev/null || { echo "Failed to unmount 
$mount_point"; status=1; }
+    rmdir "$mount_point" 2>/dev/null || true
+  fi

Review Comment:
   `cleanup()` treats any `umount` failure as fatal. Because the script now 
always runs `cleanup` via an EXIT trap, this can incorrectly turn a successful 
run into `EXIT_CLEANUP_FAILED` if the mountpoint directory exists but is no 
longer a mount (e.g. already unmounted elsewhere). Check whether the directory 
is an actual mount before attempting `umount`.
   



##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -262,19 +287,62 @@ mount_operation() {
   fi
 }

Review Comment:
   `mount_operation()` runs the `mount | tee` pipeline as a standalone command. 
Under `set -e` + `pipefail`, a mount failure will cause an immediate exit 
before the subsequent `$?` check, and the EXIT trap may then report a cleanup 
failure instead of a clear mount failure. Wrap the pipeline directly in an `if 
...; then` and quote arguments.



##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -204,6 +228,7 @@ backup_running_vm() {
 
 backup_stopped_vm() {
   mount_operation
+  check_free_space
   mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit 
1; }
 
   IFS=","

Review Comment:
   With the new `trap cleanup EXIT`, a successful `backup_stopped_vm()` run 
must unmount/remove the temporary mountpoint before exit; otherwise `cleanup()` 
will run while the target is still mounted and will `rm -rf "$dest"`, deleting 
the backup output. Also, the current `qemu-img convert` error path only calls 
`cleanup` but does not exit, which can lead to partial backups being treated as 
success.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to