jmsperu opened a new issue, #12899:
URL: https://github.com/apache/cloudstack/issues/12899

   # RFC: Incremental NAS Backup Support for KVM Hypervisor
   
   | Field         | Value                                      |
   |---------------|--------------------------------------------|
   | **Author**    | James Peru, Xcobean Systems Limited        |
   | **Status**    | Draft                                      |
   | **Created**   | 2026-03-27                                 |
   | **Target**    | Apache CloudStack 4.23+                    |
   | **Component** | Backup & Recovery (NAS Backup Provider)    |
   
   ---
   
   ## Summary
   
   This RFC proposes adding incremental backup support to CloudStack's NAS 
backup provider for KVM hypervisors. By leveraging QEMU dirty bitmaps and 
libvirt's `backup-begin` API, CloudStack can track changed disk blocks between 
backups and export only the delta, reducing daily backup storage consumption by 
80--95% and shortening backup windows from hours to minutes for large VMs. The 
feature is opt-in at the zone level, backward-compatible with existing 
full-backup behavior, and gracefully degrades on older QEMU/libvirt versions.
   
   ---
   
   ## Motivation
   
   CloudStack's NAS backup provider currently performs a full disk copy every 
time a backup is taken. For a 500 GB VM with five daily backups retained, that 
amounts to 2.5 TB of storage consumed. At scale -- tens or hundreds of VMs -- 
this becomes a serious operational and financial burden.
   
   **Problems with the current approach:**
   
   1. **Storage waste.** Every backup is a full copy of the entire virtual 
disk, regardless of how little data actually changed since the last backup.
   2. **Long backup windows.** Copying hundreds of gigabytes over NFS or SMB 
takes hours, increasing the risk of I/O contention on production workloads.
   3. **Network bandwidth pressure.** Full-disk transfers saturate the storage 
network during backup windows, impacting other VMs on the same host.
   4. **Uncompetitive feature set.** VMware (Changed Block Tracking / VADP), 
Proxmox Backup Server, and Veeam all support incremental backups natively. 
CloudStack's lack of incremental backup is a common complaint on the users@ 
mailing list and a blocker for adoption in environments with large VMs.
   
   **What incremental backup achieves:**
   
   - Only changed blocks are transferred and stored after the initial full 
backup.
   - A typical daily incremental for a 500 GB VM with moderate write activity 
is 5--15 GB, a reduction of 97--99% compared to a full copy.
   - Backup completes in minutes rather than hours.
   - Retention of 30+ daily restore points becomes economically feasible.
   
   ---
   
   ## Proposed Design
   
   ### Backup Chain Model
   
   Incremental backups form a chain anchored by a periodic full backup:
   
   ```
   Full (Day 0) -> Inc 1 (Day 1) -> Inc 2 (Day 2) -> ... -> Inc 6 (Day 6) -> 
Full (Day 7) -> ...
   ```
   
   Restoring to any point in time requires the full backup plus every 
incremental up to the desired restore point. To bound restore complexity and 
protect against chain corruption, a new full backup is forced at a configurable 
interval.
   
   **Global settings (zone scope):**
   
   | Setting                              | Type    | Default | Description     
                                     |
   
|--------------------------------------|---------|---------|------------------------------------------------------|
   | `nas.backup.incremental.enabled`     | Boolean | `false` | Enable 
incremental backup for the zone               |
   | `nas.backup.full.interval`           | Integer | `7`     | Days between 
full backups                             |
   | `nas.backup.incremental.max.chain`   | Integer | `6`     | Max incremental 
backups before forcing a new full     |
   
   When `nas.backup.incremental.enabled` is `false` (the default), behavior is 
identical to today -- every backup is a full copy. Existing deployments are 
unaffected.
   
   ---
   
   ### Technical Approach
   
   #### 1. Dirty Bitmap Tracking (QEMU Layer)
   
   QEMU 4.0 introduced persistent dirty bitmaps: per-disk bitmaps that record 
which blocks have been written since the bitmap was created. These bitmaps 
survive QEMU restarts (they are stored in the qcow2 image header) and are the 
foundation for incremental backup.
   
   **Lifecycle:**
   
   1. When incremental backup is enabled for a VM, the agent creates a 
persistent dirty bitmap on each virtual disk via QMP:
      ```json
      {
        "execute": "block-dirty-bitmap-add",
        "arguments": {
          "node": "drive-virtio-disk0",
          "name": "backup-20260327",
          "persistent": true
        }
      }
      ```
   2. QEMU automatically sets bits in this bitmap whenever the guest writes to 
a block.
   3. At backup time, the bitmap tells the backup process exactly which blocks 
to read.
   4. After a successful backup, a new bitmap is created for the next cycle and 
the old bitmap is optionally removed.
   
   #### 2. Backup Flow
   
   **Full backup (Day 0 or every `nas.backup.full.interval` days):**
   
   ```bash
   # 1. Export the entire disk to the NAS mount
   qemu-img convert -f qcow2 -O qcow2 \
     /var/lib/libvirt/images/vm-disk.qcow2 \
     /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2
   
   # 2. Create a new dirty bitmap to track changes from this point
   virsh qemu-monitor-command $DOMAIN --hmp \
     'block-dirty-bitmap-add drive-virtio-disk0 backup-20260327 persistent=true'
   ```
   
   **Incremental backup (Day 1 through Day N):**
   
   ```bash
   # 1. Use libvirt backup-begin with incremental mode
   # This exports only blocks dirty since bitmap "backup-20260327"
   cat > /tmp/backup.xml <<'XML'
   <domainbackup mode="push">
     <disks>
       <disk name="vda" backup="yes" type="file">
         <target file="/mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2"
                 type="qcow2"/>
         <driver type="qcow2"/>
       </disk>
     </disks>
     <incremental>backup-20260327</incremental>
   </domainbackup>
   XML
   
   virsh backup-begin $DOMAIN /tmp/backup.xml
   
   # 2. Wait for completion
   virsh domjobinfo $DOMAIN --completed
   
   # 3. Rotate bitmaps: remove old, create new
   virsh qemu-monitor-command $DOMAIN --hmp \
     'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327'
   virsh qemu-monitor-command $DOMAIN --hmp \
     'block-dirty-bitmap-add drive-virtio-disk0 backup-20260328 persistent=true'
   ```
   
   **New full backup cycle (Day 7):**
   
   ```bash
   # Remove all existing bitmaps
   virsh qemu-monitor-command $DOMAIN --hmp \
     'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327'
   
   # Take a full backup (same as Day 0)
   # Optionally prune expired chains from NAS
   ```
   
   #### 3. Restore Flow
   
   Restoring from an incremental chain requires replaying the full backup plus 
all incrementals up to the target restore point. This is handled entirely 
within `nasbackup.sh` and is transparent to the management server and the end 
user.
   
   **Example: Restore to Day 3 (full + 3 incrementals):**
   
   ```bash
   # 1. Create a working copy from the full backup
   cp /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2 /tmp/restored.qcow2
   
   # 2. Apply each incremental in order using qemu-img rebase
   #    Each incremental is a thin qcow2 containing only changed blocks.
   #    Rebasing merges the incremental's blocks into the chain.
   qemu-img rebase -u -b /tmp/restored.qcow2 \
     /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2
   
   qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2 \
     /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2
   
   qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2 \
     /mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2
   
   # 3. Flatten the chain into a single image
   qemu-img convert -f qcow2 -O qcow2 \
     /mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2 \
     /tmp/vm-restored-final.qcow2
   
   # 4. Return the flattened image for CloudStack to import
   ```
   
   An alternative approach uses `qemu-img commit` to merge each layer down. The 
implementation will benchmark both methods and choose the faster one for large 
images.
   
   #### 4. Database Schema Changes
   
   **Modified table: `backups`**
   
   | Column             | Type         | Description                            
        |
   
|--------------------|--------------|------------------------------------------------|
   | `backup_type`      | VARCHAR(16)  | `FULL` or `INCREMENTAL`                
        |
   | `parent_backup_id` | BIGINT (FK)  | For incremental: ID of the previous 
backup     |
   | `bitmap_name`      | VARCHAR(128) | QEMU dirty bitmap identifier for this 
backup   |
   | `chain_id`         | BIGINT (FK)  | Links to the backup chain this backup 
belongs to |
   
   **New table: `backup_chains`**
   
   | Column            | Type        | Description                              
      |
   
|-------------------|-------------|------------------------------------------------|
   | `id`              | BIGINT (PK) | Auto-increment primary key               
      |
   | `vm_instance_id`  | BIGINT (FK) | The VM this chain belongs to             
      |
   | `full_backup_id`  | BIGINT (FK) | The full backup anchoring this chain     
      |
   | `state`           | VARCHAR(16) | `ACTIVE`, `SEALED`, `EXPIRED`            
      |
   | `created`         | DATETIME    | When the chain was started               
      |
   
   **Schema migration** will be provided as a Liquibase changeset, consistent 
with CloudStack's existing migration framework. The new columns are nullable to 
maintain backward compatibility with existing backup records.
   
   #### 5. Management Server Changes
   
   **`BackupManagerImpl` (orchestration):**
   
   - Before taking a backup, query the active chain for the VM.
   - If no active chain exists, or the chain has reached 
`nas.backup.incremental.max.chain` incrementals, or `nas.backup.full.interval` 
days have elapsed since the last full backup: schedule a full backup and start 
a new chain.
   - Otherwise: schedule an incremental backup linked to the previous backup in 
the chain.
   - On backup failure: if the bitmap is suspected corrupt, mark the chain as 
`SEALED` and force a full backup on the next run.
   
   **`NASBackupProvider.takeBackup()`:**
   
   - Accept a new parameter `BackupType` (FULL or INCREMENTAL).
   - For incremental: pass the parent backup's bitmap name and NAS path to the 
agent command.
   
   **`TakeBackupCommand` / `TakeBackupAnswer`:**
   
   - Add fields: `backupType` (FULL/INCREMENTAL), `parentBackupId`, 
`bitmapName`, `parentBackupPath`.
   - The answer includes the actual size of the backup (important for 
incrementals, which are much smaller than the disk size).
   
   **`RestoreBackupCommand`:**
   
   - Add field: `backupChain` (ordered list of backup paths from full through 
the target incremental).
   - The agent reconstructs the full image from the chain before importing.
   
   #### 6. KVM Agent Changes
   
   **`LibvirtTakeBackupCommandWrapper`:**
   
   - For `FULL` backups: existing behavior (qemu-img convert), plus create the 
initial dirty bitmap.
   - For `INCREMENTAL` backups: use `virsh backup-begin` with `<incremental>` 
XML, then rotate bitmaps.
   - Pre-flight check: verify QEMU version >= 4.0 and libvirt version >= 6.0. 
If not met, fall back to full backup and log a warning.
   
   **`nasbackup.sh` enhancements:**
   
   - New flag `-i` for incremental mode.
   - New flag `-p <parent_path>` to specify the parent backup on the NAS.
   - New flag `-b <bitmap_name>` to specify which dirty bitmap to use.
   - New subcommand `restore-chain` that accepts an ordered list of backup 
paths and produces a flattened image.
   
   **`LibvirtRestoreBackupCommandWrapper`:**
   
   - If the restore target is an incremental backup, request the full chain 
from the management server and pass it to `nasbackup.sh restore-chain`.
   
   #### 7. API Changes
   
   **Existing API: `createBackup`**
   
   No change to the API signature. The management server automatically decides 
full vs. incremental based on the zone configuration and the current chain 
state. Callers do not need to specify the backup type.
   
   **Existing API: `listBackups`**
   
   Response gains two new fields:
   - `backuptype` (string): `Full` or `Incremental`
   - `parentbackupid` (string): UUID of the parent backup (null for full 
backups)
   
   **Existing API: `restoreBackup`**
   
   No change. The management server resolves the full chain internally.
   
   #### 8. UI Changes
   
   - **Backup list view:** Add a "Type" column showing `Full` or `Incremental`, 
with a visual indicator (e.g., a small chain icon for incrementals).
   - **Backup detail view:** Show the backup chain as a vertical timeline: full 
backup at the top, incrementals branching down, with sizes and timestamps.
   - **Restore dialog:** When the user selects an incremental restore point, 
display a note: "This restore will replay N backups (total chain size: X GB)."
   - **Backup schedule settings** (zone-level): Toggle for incremental backup, 
full backup interval slider, max chain length input.
   
   ---
   
   ### Storage Savings Projections
   
   The following estimates assume a moderate write workload (2--5% of disk 
blocks changed per day), which is typical for application servers, databases 
with WAL, and file servers.
   
   | Scenario                          | Full Backups Only | With Incremental   
| Savings    |
   
|-----------------------------------|-------------------|--------------------|------------|
   | 500 GB VM, 7 daily backups       | 3.5 TB            | ~550 GB            
| **84%**    |
   | 1 TB VM, 30 daily backups        | 30 TB             | ~1.3 TB            
| **96%**    |
   | 100 VMs x 100 GB, weekly cycle   | 70 TB/week        | ~12 TB/week        
| **83%**    |
   | 50 VMs x 200 GB, 30-day retain   | 300 TB            | ~18 TB             
| **94%**    |
   
   For environments with higher change rates (e.g., heavy database writes), 
incremental sizes will be larger, but savings still typically exceed 60%.
   
   ---
   
   ### Requirements
   
   | Requirement             | Minimum Version | Notes                          
                |
   
|-------------------------|----------------|-------------------------------------------------|
   | QEMU                    | 4.0+           | Dirty bitmap support. Ubuntu 
20.04+, RHEL 8+.  |
   | libvirt                 | 6.0+           | `virsh backup-begin` support. 
Ubuntu 22.04+, RHEL 8.3+. |
   | CloudStack              | 4.19+          | NAS backup provider must 
already be present.    |
   | NAS storage             | NFS or SMB     | No special requirements beyond 
existing NAS backup support. |
   
   **Graceful degradation:** If a KVM host runs QEMU < 4.0 or libvirt < 6.0, 
the agent will detect this at startup and report 
`incrementalBackupCapable=false` to the management server. Backups for VMs on 
that host will remain full-only, with a warning logged. No manual intervention 
is required.
   
   ---
   
   ### Risks and Mitigations
   
   | Risk | Impact | Mitigation |
   |------|--------|------------|
   | **Bitmap corruption** (host crash during backup, QEMU bug) | Incremental 
backup produces an incomplete or incorrect image | Detect bitmap inconsistency 
via QMP query; force a new full backup and start a fresh chain. Data in 
previous full backup is unaffected. |
   | **Chain too long** (missed full backup schedule) | Restore time increases; 
single corrupt link breaks the chain | Enforce 
`nas.backup.incremental.max.chain` hard limit. If exceeded, the next backup is 
automatically a full. |
   | **Restore complexity** | User confusion about which backup to pick; longer 
restore for deep chains | Restore logic is fully automated in `nasbackup.sh`. 
The UI shows a single "Restore" button per restore point, with the chain 
replayed transparently. |
   | **VM live migration during backup** | Dirty bitmap may be lost if migrated 
mid-backup | Check VM state before backup; abort and retry if migration is in 
progress. Bitmaps persist across clean shutdowns and restarts but not across 
live migration in older QEMU versions. For QEMU 6.2+, bitmaps survive 
migration. |
   | **Backward compatibility** | Existing full-backup users should not be 
affected | Feature is disabled by default. No schema changes affect existing 
rows (new columns are nullable). Full-backup code path is unchanged. |
   | **Disk space during restore** | Flattening a chain requires temporary disk 
space equal to the full disk size | Use the same scratch space already used for 
full backup restores. Document the requirement. |
   
   ---
   
   ### Implementation Plan
   
   | Phase | Scope | Estimated Effort |
   |-------|-------|------------------|
   | **Phase 1** | Core incremental backup and restore in `nasbackup.sh` and 
KVM agent wrappers. Dirty bitmap lifecycle management. Manual testing with 
`virsh` and `qemu-img`. | 2--3 weeks |
   | **Phase 2** | Management server changes: chain management, scheduling 
logic, global settings, database schema migration, API response changes. | 2 
weeks |
   | **Phase 3** | UI changes: backup type column, chain visualization, restore 
dialog enhancements, zone-level settings. | 1 week |
   | **Phase 4** | Integration testing (full cycle: enable, backup, restore, 
disable, upgrade from older version). Edge case testing (host crash, bitmap 
loss, migration, mixed QEMU versions). Documentation. | 2 weeks |
   
   **Total estimated effort: 7--8 weeks.**
   
   We (Xcobean Systems) intend to implement this and submit PRs against the 
`main` branch. We would appreciate early design feedback before starting 
implementation to avoid rework.
   
   ---
   
   ### Prior Art
   
   - **VMware VADP / Changed Block Tracking (CBT):** VMware's CBT is the 
industry-standard approach. A change tracking driver inside the hypervisor 
records changed blocks, and backup vendors query the CBT via the vSphere API. 
This RFC's approach is analogous, using QEMU dirty bitmaps as the CBT 
equivalent.
   
   - **Proxmox Backup Server (PBS):** PBS uses QEMU dirty bitmaps to implement 
incremental backups natively. Their implementation validates that the dirty 
bitmap approach is production-ready for KVM/QEMU environments. PBS has been 
stable since Proxmox VE 6.4 (2020).
   
   - **Veeam Backup & Replication:** Veeam uses a "reverse incremental" model 
where the most recent backup is always a synthetic full, and older backups are 
stored as reverse deltas. This simplifies restore (always restore from the 
latest full) at the cost of more I/O during backup. We chose the 
forward-incremental model for simplicity and because it aligns with how QEMU 
dirty bitmaps work natively.
   
   - **libvirt backup API:** The `virsh backup-begin` command and its 
underlying `virDomainBackupBegin()` API were specifically designed for this use 
case. The libvirt documentation includes examples of incremental backup using 
dirty bitmaps. See: https://libvirt.org/kbase/incremental-backup.html
   
   ---
   
   ### About the Author
   
   Xcobean Systems Limited operates a production Apache CloudStack deployment 
providing IaaS to 50+ client VMs. We use the NAS backup provider daily and have 
contributed several improvements to it:
   
   - PR [#12805](https://github.com/apache/cloudstack/pull/12805) -- NAS backup 
NPE fix
   - PR [#12822](https://github.com/apache/cloudstack/pull/12822) -- Backup 
restore improvements
   - PR [#12826](https://github.com/apache/cloudstack/pull/12826) -- NAS backup 
script hardening
   - PRs 
[#12843](https://github.com/apache/cloudstack/pull/12843)--[#12848](https://github.com/apache/cloudstack/pull/12848)
 -- Various NAS backup fixes
   - PR [#12872](https://github.com/apache/cloudstack/pull/12872) -- Additional 
backup provider fixes
   
   We experience the storage and bandwidth cost of full-only backups firsthand 
and are motivated to solve this problem upstream rather than maintaining a fork.
   
   ---
   
   ## Open Questions for Discussion
   
   We welcome feedback from the community on the following:
   
   1. **Interest level.** Is there sufficient demand for this feature to 
justify the implementation effort? We believe so based on mailing list threads, 
but would like confirmation.
   
   2. **Dirty bitmaps vs. alternatives.** Are there concerns about relying on 
QEMU dirty bitmaps? Alternative approaches include file-level deduplication on 
the NAS (less efficient, not hypervisor-aware) or `qemu-img compare` (slower, 
requires reading both images).
   
   3. **Target release.** Should this target CloudStack 4.23, or is a later 
release more appropriate given the scope?
   
   4. **Chain model.** We proposed forward-incremental with periodic full 
backups. Would the community prefer a different model (e.g., 
reverse-incremental like Veeam, or forever-incremental with periodic synthetic 
fulls)?
   
   5. **Scope of first PR.** Should we submit the entire feature as one PR, or 
break it into smaller PRs (e.g., nasbackup.sh changes first, then agent, then 
management server, then UI)?
   
   6. **Testing infrastructure.** We can test against our production 
environment (Ubuntu 22.04, QEMU 6.2, libvirt 8.0). Are there CI environments or 
community test labs available for broader testing (RHEL, Rocky, older QEMU 
versions)?
   
   ---
   
   *This RFC is posted as a GitHub Discussion to gather community feedback 
before implementation begins. Please share your thoughts, concerns, and 
suggestions.*
   
   ---
   
   ## Appendix: Related Proposal — CloudStack Infrastructure Backup to NAS
   
   ### Problem
   CloudStack's NAS backup provider only backs up VM disks. The management 
server database, agent configurations, SSL certificates, and global settings 
are not backed up. If the management server fails, all metadata is lost unless 
someone manually configured mysqldump.
   
   ### Proposed Solution
   Add a new scheduled task that automatically backs up CloudStack 
infrastructure to the same NAS backup storage.
   
   **What gets backed up:**
   | Component | Method | Size |
   |-----------|--------|------|
   | CloudStack database (`cloud`, `cloud_usage`) | mysqldump | ~50-500MB |
   | Management server config (`/etc/cloudstack/management/`) | tar | <1MB |
   | Agent configs (`/etc/cloudstack/agent/`) | tar | <1MB |
   | SSL certificates and keystores | tar | <1MB |
   | Global settings export | SQL dump | <1MB |
   
   **Configuration:**
   - `nas.infra.backup.enabled` (global, default: false)
   - `nas.infra.backup.schedule` (cron expression, default: `0 2 * * *` — daily 
at 2am)
   - `nas.infra.backup.retention` (number of backups to keep, default: 7)
   
   **Implementation:**
   - New class: `InfrastructureBackupTask` extending `ManagedContextRunnable`
   - Runs on management server (not KVM agent)
   - Uses existing NAS mount point from backup storage pool
   - Creates timestamped directory: `infra-backup/2026-03-27/`
   - Runs `mysqldump --single-transaction` for hot backup
   - Tars config directories
   - Manages retention (delete backups older than N days)
   - Logs to CloudStack events for audit trail
   
   **Restore:** 
   - Manual via CLI: `mysql cloud < backup.sql` + extract config tars
   - Future: one-click restore from UI
   
   This is a much simpler change (~200 lines of Java) that addresses a real 
operational gap. Could target 4.22.1 or 4.23.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to