abh1sar commented on code in PR #13074:
URL: https://github.com/apache/cloudstack/pull/13074#discussion_r3450216741
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -324,6 +553,36 @@ while [[ $# -gt 0 ]]; do
shift
shift
;;
+ -M|--mode)
+ MODE="$2"
+ shift
+ shift
+ ;;
+ --bitmap-new)
+ BITMAP_NEW="$2"
+ shift
+ shift
+ ;;
+ --bitmap-parent)
+ BITMAP_PARENT="$2"
+ shift
+ shift
+ ;;
+ --parent-paths)
+ PARENT_PATHS="$2"
+ shift
+ shift
+ ;;
+ --rebase-target)
+ REBASE_TARGET="$2"
+ shift
+ shift
+ ;;
+ --rebase-new-backing)
+ REBASE_NEW_BACKING="$2"
+ shift
+ shift
+ ;;
Review Comment:
remove these as well
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -335,8 +594,13 @@ while [[ $# -gt 0 ]]; do
esac
done
-# Perform Initial sanity checks
-sanity_checks
+# QEMU >= 4.2 and libvirt >= 7.2 are only required for backup-begin
(incremental
+# checkpoints and per-bitmap exports). Legacy full-only backups, plus delete /
+# stats / rebase operations, run on older versions just fine. Gate the version
+# check to the paths that actually need it to preserve backward compatibility.
+if [ "$OP" = "backup" ] && [ -n "$MODE" ]; then
+ sanity_checks
+fi
Review Comment:
This doesn't look right.
`sanity_checks` was being called unconditionally before. It has nothing to
do with incremental backups.
This change should be removed.
##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -168,6 +200,276 @@ protected Host getVMHypervisorHost(VirtualMachine vm) {
return
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
vm.getDataCenterId());
}
+ /**
+ * Returned by {@link #decideChain(VirtualMachine)} to describe the next
backup's place in
+ * the chain: full vs incremental, the bitmap name to create, and (for
incrementals) the
+ * parent bitmap and parent file path.
+ */
+ static final class ChainDecision {
+ final String mode; // "full" or "incremental"
+ final String bitmapNew;
+ final String bitmapParent; // null for full
+ // Per-volume parent backup file paths, one per current VM volume in
deviceId order.
+ // null/empty for full. Each volume needs its own parent file because
backup files
+ // are named after each volume's own UUID (root.<uuid>.qcow2 /
datadisk.<uuid>.qcow2).
+ final List<String> parentPaths;
+ final String chainId; // chain identifier this backup belongs
to
+ final int chainPosition; // 0 for full, N for the Nth incremental
in the chain
+
+ private ChainDecision(String mode, String bitmapNew, String
bitmapParent, List<String> parentPaths,
+ String chainId, int chainPosition) {
+ this.mode = mode;
+ this.bitmapNew = bitmapNew;
+ this.bitmapParent = bitmapParent;
+ this.parentPaths = parentPaths;
+ this.chainId = chainId;
+ this.chainPosition = chainPosition;
+ }
+
+ static ChainDecision fullStart(String bitmapName) {
+ return new ChainDecision(NASBackupChainKeys.TYPE_FULL, bitmapName,
null, null,
+ UUID.randomUUID().toString(), 0);
+ }
+
+ static ChainDecision incremental(String bitmapNew, String
bitmapParent, List<String> parentPaths,
+ String chainId, int chainPosition) {
+ return new ChainDecision(NASBackupChainKeys.TYPE_INCREMENTAL,
bitmapNew, bitmapParent,
+ parentPaths, chainId, chainPosition);
+ }
+
+ boolean isIncremental() {
+ return NASBackupChainKeys.TYPE_INCREMENTAL.equals(mode);
+ }
+ }
+
+ /**
+ * Decides whether the next backup for {@code vm} should be a fresh full
or an incremental
+ * appended to the existing chain. Stopped VMs are always full (libvirt
{@code backup-begin}
+ * requires a running QEMU process). The {@code nas.backup.full.every}
ConfigKey controls
+ * how many backups (full + incrementals) form one chain before a new full
is forced.
+ *
+ * <p>The decision is anchored on the VM's {@code
nas.active_checkpoint_id} detail, which
+ * records the bitmap that currently exists on the running QEMU. After a
restore that
+ * detail is cleared, so the next backup is automatically full — even
though there may be
+ * a more recent "last backup taken" row in the database. The decision
deliberately avoids
+ * relying on "last backup taken", because that row is misleading after a
restore.</p>
+ */
+ protected ChainDecision decideChain(VirtualMachine vm) {
+ final String newBitmap = "backup-" + System.currentTimeMillis() /
1000L;
+
+ // Master switch — when the operator disables incrementals at the zone
level every
+ // backup is taken as a fresh full. Existing chains stay restorable
because each
+ // backup's metadata is kept independently; restoring an incremental
still walks its
+ // own chain (the per-backup chain_id / parent_backup_id details
persist regardless
+ // of the live config). The next backup with this flag back on starts
a new chain.
+ Boolean incrementalEnabled =
NASBackupIncrementalEnabled.valueIn(vm.getDataCenterId());
+ if (incrementalEnabled == null || !incrementalEnabled) {
+ return ChainDecision.fullStart(newBitmap);
Review Comment:
Also return mode as `legacy-full` as suggested in the other comment
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +131,86 @@ backup_running_vm() {
mount_operation
mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit
1; }
+ # Determine effective mode for this run.
+ # Legacy callers (no -M argument) get the original full-only behavior with
no checkpoint.
+ # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required
args before
+ # invoking the script; the case below is a defensive fallback for direct
invocations.
+ local effective_mode="${MODE:-legacy-full}"
+ local make_checkpoint=0
+ case "$effective_mode" in
+ incremental|full)
+ make_checkpoint=1
+ ;;
+ legacy-full)
+ make_checkpoint=0
+ ;;
+ *)
+ echo "Unknown mode: $effective_mode"
+ cleanup
+ exit 1
+ ;;
+ esac
+
+ # When incremental, make sure the parent checkpoint is registered with
libvirt. CloudStack
+ # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory
checkpoint
+ # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it
on start). A
+ # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already
exists" — and
+ # qemu-img cannot drop the bitmap on a running disk (the image is
write-locked). The parent
+ # must instead be re-registered with --redefine, using the FULL checkpoint
XML this script
+ # saved alongside the parent backup when it was taken (a minimal/synthesized
XML is rejected
+ # by libvirt's checkpoint schema on redefine, so the full dump is required).
+ if [[ "$effective_mode" == "incremental" ]]; then
+ if ! virsh -c qemu:///system checkpoint-list "$VM" --name 2>/dev/null |
grep -qx "$BITMAP_PARENT"; then
+ parent_first="${PARENT_PATHS%%,*}"
+ parent_cp_xml="$(dirname "$parent_first")/$BITMAP_PARENT.checkpoint.xml"
+ if [[ -f "$parent_cp_xml" ]] && \
+ virsh -c qemu:///system checkpoint-create "$VM" --xmlfile
"$parent_cp_xml" --redefine > /dev/null 2>&1; then
+ : # parent checkpoint re-registered; the incremental can proceed
against it
+ else
+ # No saved checkpoint XML (e.g. a backup taken before this fix) or
redefine failed.
+ # Signal the Java wrapper to retry as a full backup so the chain
restarts cleanly
+ # instead of failing the backup. The wrapper is responsible for the
retry and for
+ # recording incrementalFallback=true on the resulting BackupAnswer.
+ log -e "incremental: parent checkpoint $BITMAP_PARENT could not be
re-registered — exiting $EXIT_INCREMENTAL_UNSUPPORTED for caller-driven
fallback"
+ cleanup
+ exit $EXIT_INCREMENTAL_UNSUPPORTED
+ fi
Review Comment:
Can't we just set $effective_mode="full" here instead of returning? I think
that will make the code much simpler.
We can remove the below check as well since backup of stopped VM should not
be sent using incremental mode anyway.
```
backup_stopped_vm() {
# Stopped VMs cannot use libvirt's backup-begin (no QEMU process). Take a
full
# backup via qemu-img convert. If the caller asked for incremental, signal
the
# Java wrapper to retry as full and record the fallback on the
BackupAnswer.
if [[ "$MODE" == "incremental" ]]; then
log -e "incremental: VM stopped — exiting $EXIT_INCREMENTAL_UNSUPPORTED
for caller-driven fallback to full"
exit $EXIT_INCREMENTAL_UNSUPPORTED
fi
```
With these two changes we can remove EXIT_INCREMENTAL_UNSUPPORTED completely
from the script and the wrapper
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -238,6 +413,51 @@ delete_backup() {
rmdir $mount_point
}
+# Rebase an existing backup qcow2 (e.g. a chain child) onto a new backing
parent so the chain
+# stays valid after a middle backup is deleted. Both --target and
--new-backing are passed as
+# paths relative to the NAS mount root; we resolve them under $mount_point and
write the new
+# backing reference relative to the target file's directory (mount points are
ephemeral).
+rebase_backup() {
+ mount_operation
+
+ if [[ -z "$REBASE_TARGET" || -z "$REBASE_NEW_BACKING" ]]; then
+ echo "rebase requires --rebase-target and --rebase-new-backing"
+ cleanup
+ exit 1
+ fi
+
+ local target_abs="$mount_point/$REBASE_TARGET"
+ local backing_abs="$mount_point/$REBASE_NEW_BACKING"
+ if [[ ! -f "$target_abs" ]]; then
+ echo "Rebase target file does not exist: $target_abs"
+ cleanup
+ exit 1
+ fi
+ if [[ ! -f "$backing_abs" ]]; then
+ echo "New backing file does not exist: $backing_abs"
+ cleanup
+ exit 1
+ fi
+ local target_dir
+ target_dir=$(dirname "$target_abs")
+ local backing_rel
+ backing_rel=$(realpath --relative-to="$target_dir" "$backing_abs")
+
+ # SAFE rebase (no -u): qemu-img reads blocks from the old chain and writes
them into
+ # the target where the new chain doesn't cover them. This is the "merge
into" semantic
+ # required when we're about to delete the old immediate parent — the target
needs to
+ # absorb the to-be-deleted parent's blocks so the chain remains consistent
against the
+ # new (further-back) backing.
+ if ! qemu-img rebase -b "$backing_rel" -F qcow2 "$target_abs" >> "$logFile"
2> >(cat >&2); then
+ echo "qemu-img rebase failed for $target_abs onto $backing_rel"
+ cleanup
+ exit 1
+ fi
+ sync
+ umount $mount_point
+ rmdir $mount_point
Review Comment:
rebase_backup() is not being called now. Remove all occurrences from script.
##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -205,12 +507,20 @@ public Pair<Boolean, Backup> takeBackup(final
VirtualMachine vm, Boolean quiesce
final String backupPath = String.format("%s/%s", vm.getInstanceName(),
new
SimpleDateFormat("yyyy.MM.dd.HH.mm.ss").format(creationDate));
- BackupVO backupVO = createBackupObject(vm, backupPath);
+ // Decide full vs incremental for this backup. Stopped VMs are always
full
+ // (libvirt backup-begin requires a running QEMU process).
+ ChainDecision decision = decideChain(vm);
+
+ BackupVO backupVO = createBackupObject(vm, backupPath,
decision.isIncremental() ? "INCREMENTAL" : "FULL");
TakeBackupCommand command = new
TakeBackupCommand(vm.getInstanceName(), backupPath);
command.setBackupRepoType(backupRepository.getType());
command.setBackupRepoAddress(backupRepository.getAddress());
command.setMountOptions(backupRepository.getMountOptions());
command.setQuiesce(quiesceVM);
+ command.setMode(decision.mode);
+ command.setBitmapNew(decision.bitmapNew);
+ command.setBitmapParent(decision.bitmapParent);
+ command.setParentPaths(decision.parentPaths);
Review Comment:
Let's have 3 modes here as well: incremental | full and | legacy_full
And not set any bitmap fields or parent path if the mode is legacy_full.
##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java:
##########
@@ -69,8 +84,60 @@ public Answer execute(TakeBackupCommand command,
LibvirtComputingResource libvir
}
}
- List<String[]> commands = new ArrayList<>();
- commands.add(new String[]{
+ final String requestedMode = command.getMode();
+ Pair<Integer, String> result =
runBackupScript(libvirtComputingResource, command, vmName, backupRepoType,
backupRepoAddress,
+ mountOptions, backupPath, diskPaths, requestedMode,
+ command.getBitmapNew(), command.getBitmapParent(),
command.getParentPaths(), timeout);
+
+ boolean incrementalFallback = false;
+ if (result.first() == EXIT_INCREMENTAL_UNSUPPORTED &&
MODE_INCREMENTAL.equals(requestedMode)) {
+ // Script told us the incremental can't proceed (parent checkpoint
can't be
+ // re-registered, or VM is stopped). Re-invoke as full with the
same --bitmap-new
+ // so the chain restarts cleanly. Drop --bitmap-parent +
--parent-paths since
+ // they no longer apply.
+ logger.info("nasbackup.sh signalled incremental unsupported for VM
" + vmName + " — retrying as full");
+ result = runBackupScript(libvirtComputingResource, command,
vmName, backupRepoType, backupRepoAddress,
+ mountOptions, backupPath, diskPaths, MODE_FULL,
+ command.getBitmapNew(), null, null, timeout);
+ incrementalFallback = true;
+ }
+
+ boolean bitmapSeeded = true;
+ if (result.first() == EXIT_BITMAP_NOT_SEEDED) {
+ // Backup file is valid; the host just has no bitmap. Treat as
success but
+ // mark bitmapCreated=null so the orchestrator clears
active_checkpoint_id.
+ bitmapSeeded = false;
+ } else if (result.first() != 0) {
+ logger.debug("Failed to take VM backup: " + result.second());
+ BackupAnswer answer = new BackupAnswer(command, false,
result.second().trim());
+ if (result.first() == EXIT_CLEANUP_FAILED) {
+ logger.debug("Backup cleanup failed");
+ answer.setNeedsCleanup(true);
+ }
+ return answer;
+ }
+
+ String stdout = result.second().trim();
+ long backupSize = parseBackupSize(stdout, diskPaths);
+
+ BackupAnswer answer = new BackupAnswer(command, true, stdout);
+ answer.setSize(backupSize);
+ // bitmapCreated mirrors what we asked the script to create — except
when the
+ // script exited EXIT_BITMAP_NOT_SEEDED, in which case the host has no
bitmap
+ // and the orchestrator must clear active_checkpoint_id.
+ answer.setBitmapCreated(bitmapSeeded ? command.getBitmapNew() : null);
Review Comment:
bitmapSeeded is by default true. Event if incremental backup feature is not
enabled and bitmaps are not actually created.
This causes nas.active_checkpoint_id to be set in vm_details without the
feature being enabled.
I suggest returning an error if bitmaps cannot be created instead of
EXIT_BITMAP_NOT_SEEDED.
It is only being used in case of stopped VMs. bitmap --add operation should
not fail ideally and it should be ok to fail the backup if bitmap --add fails
for any disk, so that the underlying problem is dealt with instead of forcing
full backups.
And for the cases where the incremental feature is not enabled, we should
not be sending `bitmap_new` in the `TakeBackupCommand`
##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -168,6 +200,276 @@ protected Host getVMHypervisorHost(VirtualMachine vm) {
return
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
vm.getDataCenterId());
}
+ /**
+ * Returned by {@link #decideChain(VirtualMachine)} to describe the next
backup's place in
+ * the chain: full vs incremental, the bitmap name to create, and (for
incrementals) the
+ * parent bitmap and parent file path.
+ */
+ static final class ChainDecision {
+ final String mode; // "full" or "incremental"
+ final String bitmapNew;
+ final String bitmapParent; // null for full
+ // Per-volume parent backup file paths, one per current VM volume in
deviceId order.
+ // null/empty for full. Each volume needs its own parent file because
backup files
+ // are named after each volume's own UUID (root.<uuid>.qcow2 /
datadisk.<uuid>.qcow2).
+ final List<String> parentPaths;
+ final String chainId; // chain identifier this backup belongs
to
+ final int chainPosition; // 0 for full, N for the Nth incremental
in the chain
+
+ private ChainDecision(String mode, String bitmapNew, String
bitmapParent, List<String> parentPaths,
+ String chainId, int chainPosition) {
+ this.mode = mode;
+ this.bitmapNew = bitmapNew;
+ this.bitmapParent = bitmapParent;
+ this.parentPaths = parentPaths;
+ this.chainId = chainId;
+ this.chainPosition = chainPosition;
+ }
+
+ static ChainDecision fullStart(String bitmapName) {
+ return new ChainDecision(NASBackupChainKeys.TYPE_FULL, bitmapName,
null, null,
+ UUID.randomUUID().toString(), 0);
+ }
+
+ static ChainDecision incremental(String bitmapNew, String
bitmapParent, List<String> parentPaths,
+ String chainId, int chainPosition) {
+ return new ChainDecision(NASBackupChainKeys.TYPE_INCREMENTAL,
bitmapNew, bitmapParent,
+ parentPaths, chainId, chainPosition);
+ }
+
+ boolean isIncremental() {
+ return NASBackupChainKeys.TYPE_INCREMENTAL.equals(mode);
+ }
+ }
+
+ /**
+ * Decides whether the next backup for {@code vm} should be a fresh full
or an incremental
+ * appended to the existing chain. Stopped VMs are always full (libvirt
{@code backup-begin}
+ * requires a running QEMU process). The {@code nas.backup.full.every}
ConfigKey controls
+ * how many backups (full + incrementals) form one chain before a new full
is forced.
+ *
+ * <p>The decision is anchored on the VM's {@code
nas.active_checkpoint_id} detail, which
+ * records the bitmap that currently exists on the running QEMU. After a
restore that
+ * detail is cleared, so the next backup is automatically full — even
though there may be
+ * a more recent "last backup taken" row in the database. The decision
deliberately avoids
+ * relying on "last backup taken", because that row is misleading after a
restore.</p>
+ */
+ protected ChainDecision decideChain(VirtualMachine vm) {
+ final String newBitmap = "backup-" + System.currentTimeMillis() /
1000L;
+
+ // Master switch — when the operator disables incrementals at the zone
level every
+ // backup is taken as a fresh full. Existing chains stay restorable
because each
+ // backup's metadata is kept independently; restoring an incremental
still walks its
+ // own chain (the per-backup chain_id / parent_backup_id details
persist regardless
+ // of the live config). The next backup with this flag back on starts
a new chain.
+ Boolean incrementalEnabled =
NASBackupIncrementalEnabled.valueIn(vm.getDataCenterId());
+ if (incrementalEnabled == null || !incrementalEnabled) {
+ return ChainDecision.fullStart(newBitmap);
Review Comment:
Don't set `newBitmap` if incremental backups are not enabled.
Otherwise it causes the bitmap information being persisted in backup_details
and vm_instance_details even though the bitmap didn't actually get persisted on
disk.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]