Package: libvirt-daemon
Version: 11.3.0-3
Severity: important
Tags: security
X-Debbugs-Cc: [email protected], Debian Security Team 
<[email protected]>

Context: quick review of what happens if you 'virsh snapshot-delete' an 
external disk snapshot when the virtual disk has the properties discard='unmap' 
and detect_zeroes='off':

# # We're going to make this disk image small so we don't have a Bad Day
# qemu-img create -f qcow2 /var/lib/libvirt/images/trogdor.qcow2 5G
# # [put a base install of Debian or similar into this disk image]
# grep -A3 "device='disk'" trogdor.xml
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' discard='unmap' detect_zeroes='off'/>
      <source file='/var/lib/libvirt/images/trogdor.qcow2'/>
      <target dev='vda' bus='virtio'/>
# virsh define trogdor.xml
Domain 'trogdor' defined from trogdor.xml
# virsh start trogdor
Domain 'trogdor' started
# # Make a few snapshots and have the guest TRIM the filesystem in the second 
one
# N=1; virsh snapshot-create-as --domain trogdor snap$N --diskspec 
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap1 created
# N=2; virsh snapshot-create-as --domain trogdor snap$N --diskspec 
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap2 created
# virsh domfstrim trogdor
# N=3; virsh snapshot-create-as --domain trogdor snap$N --diskspec 
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap3 created
# # Now delete the second one
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
324K    /var/lib/libvirt/images/trogdor-snap1.qcow2
2.1M    /var/lib/libvirt/images/trogdor-snap2.qcow2
456K    /var/lib/libvirt/images/trogdor-snap3.qcow2
# virsh snapshot-delete trogdor snap2
Domain snapshot snap2 deleted
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
3.0G    /var/lib/libvirt/images/trogdor-snap1.qcow2
456K    /var/lib/libvirt/images/trogdor-snap3.qcow2

Oops. The blockcommit implicit in deleting the snapshot turned all the TRIM'd 
zeroes into actual zeros because detect_zeroes is off. Which is probably a bug 
in itself; it would make sense to have discard='unmap' set 
detect_zeroes='unmap' during blockcommit/blockpull even if it doesn't the rest 
of the time. This is obviously a much bigger problem when VMs have 3TB of free 
space instead of 3GB. We must never do that. Whereas this is what happens (or 
should happen) if you do the same thing with detect_zeroes='unmap':

# grep -A3 "device='disk'" trogdor.xml
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' discard='unmap' detect_zeroes='unmap'/>
      <source file='/var/lib/libvirt/images/trogdor.qcow2'/>
      <target dev='vda' bus='virtio'/>
# virsh define trogdor.xml
Domain 'trogdor' defined from trogdor.xml
# virsh start trogdor
Domain 'trogdor' started
# # Make a few snapshots and have the guest TRIM the filesystem in the second 
one
# N=1; virsh snapshot-create-as --domain trogdor snap$N --diskspec 
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap1 created
# # IT said try turning it off and on again
# virsh shutdown trogdor
Domain 'trogdor' is being shutdown
# # wait for it to shut down
# virsh start trogdor
Domain 'trogdor' started
# N=2; virsh snapshot-create-as --domain trogdor snap$N --diskspec 
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap2 created
# virsh domfstrim trogdor
# N=3; virsh snapshot-create-as --domain trogdor snap$N --diskspec 
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap3 created
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
5.4M    /var/lib/libvirt/images/trogdor-snap1.qcow2
3.9M    /var/lib/libvirt/images/trogdor-snap2.qcow2
200K    /var/lib/libvirt/images/trogdor-snap3.qcow2
# virsh snapshot-delete trogdor snap2
Domain snapshot snap2 deleted
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
5.8M    /var/lib/libvirt/images/trogdor-snap1.qcow2
1.6M    /var/lib/libvirt/images/trogdor-snap3.qcow2

Now the merged snapshot is a much more appropriate size. Notice the VM shutdown 
and start between creating the first snapshot and the second. That's to work 
around the bug, because the bug is that detect_zeroes isn't enabled when it 
ought to be. It's related to these lines in qemuDomainPrepareDiskSourceData() 
in src/qemu/qemu_domain.c:

    /* transfer properties valid only for the top level image */
    if (src == disk->src || src == disk->src->dataFileStore)
        src->detect_zeroes = disk->detect_zeroes;

The conditional here might have been crafted from a place of love, but it ain't 
right. The function is meant to copy properties from 'disk' to 'src'. What the 
detect_zeroes property does is check if data being written is all zeroes and 
then if it is, uses unmap instead of actually writing the zeroes. Typically 
writes only go to the top-level image so if none of the others ever get written 
to then they don't need detect_zeroes to be enabled, right? That's the theory?

There are three problems with this. The first is that it's apparently 
fruitless. Enabling detect_zeroes is somewhat computationally expensive so the 
theory might be to turn it off for read-only images where it isn't needed, but 
if they're actually read-only then they shouldn't be getting any writes anyway 
and then there shouldn't be any computation to be saved by turning it off.

The second is that the backing files aren't always read-only, and in particular 
live blockcommit writes to them and very much wants detect_zeroes='unmap'. This 
is where you get the denial of service, because it happens during live snapshot 
deletion. A user could create arbitrarily many snapshots of trivial size and 
TRIM them without exceeding a storage quota and the disk usage would only 
increase by a factor of a thousand when deleting, which renders adversarial the 
operation that should normally be expected to free up space. Your storage gets 
filled with zeroes even when you set detect_zeroes='unmap' and your VMs crash 
because your host filesystem is completely full.

The third is that the check has a TOCTOU fail which is triggered during live 
snapshot creation. It seems to do what it claims to during guest boot but 
during snapshot creation that function is called against a new snapshot which 
is about to be the top level image but 'disk->src' isn't set to 'src' until 
qemuSnapshotDiskUpdateSource() which is after the call to 
qemuDomainPrepareDiskSourceData(). With the result that detect_zeroes for what 
is about to be the top-level image is left as its default value, which is 
'off'. So detect_zeroes gets erroneously disabled for the live top-level image 
whenever you create a live snapshot. It also stays enabled for what was the 
top-level image when the guest booted. This is why snapshot-delete "worked" 
after shutting down the guest so it could boot with the top-level snapshot as 
the one we would ultimately merge the deleted one into -- detect_zeroes at that 
point was still 'unmap' for snap1 instead of the then-top level snap3.

All of this seems to be sorted by removing the conditional in 
qemuDomainPrepareDiskSourceData() so that 'src->detect_zeroes = 
disk->detect_zeroes;' happens unconditionally. That and updating a few tests in 
tests/qemublocktestdata/xml2json to expect detect_zeroes to be present in the 
relevant JSON causes detect_zeroes='unmap' to work as expected with 
blockcommit/snapshot-delete.

If for some reason that change isn't desirable, the most irksome of the trouble 
could be eliminated by having qemuBlockCommit(), and therefore snapshot-delete, 
temporarily set detect_zeroes='unmap' on the commit target destination image 
whenever discard='unmap' is set. This would be good to do even if the 
conditional in qemuDomainPrepareDiskSourceData() *is* removed, because 
blockcommit causes disk images to explode with zeroes whenever discard='unmap' 
and detect_zeroes isn't, and that isn't desirable even if you don't want to 
constantly pay the cost of enabling detect_zeroes on the live top image. Or to 
put it another way, you shouldn't have to enable it on the live top image if 
you only want it during blockcommit/snapshot-delete, but you will want it 
during blockcommit whenever discard is enabled. 

-- System Information:
Debian Release: 13.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.48+deb13-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libvirt-daemon depends on:
ii  libc6                  2.41-12
ii  libgcc-s1              14.2.0-19
ii  libglib2.0-0t64        2.84.4-3~deb13u1
ii  libtirpc3t64           1.3.6+ds-1
ii  libvirt-common         11.3.0-3
ii  libvirt-daemon-common  11.3.0-3
ii  libvirt-daemon-log     11.3.0-3
ii  libvirt0               11.3.0-3
ii  libxml2                2.12.7+dfsg+really2.9.14-2.1+deb13u1
ii  logrotate              3.22.0-1

Versions of packages libvirt-daemon recommends:
ii  libvirt-daemon-driver-interface        11.3.0-3
ii  libvirt-daemon-driver-network          11.3.0-3
ii  libvirt-daemon-driver-nodedev          11.3.0-3
ii  libvirt-daemon-driver-nwfilter         11.3.0-3
ii  libvirt-daemon-driver-qemu             11.3.0-3
ii  libvirt-daemon-driver-secret           11.3.0-3
ii  libvirt-daemon-driver-storage          11.3.0-3
ii  libvirt-daemon-driver-storage-disk     11.3.0-3
ii  libvirt-daemon-driver-storage-iscsi    11.3.0-3
ii  libvirt-daemon-driver-storage-logical  11.3.0-3
ii  libvirt-daemon-driver-storage-mpath    11.3.0-3
ii  libvirt-daemon-driver-storage-scsi     11.3.0-3
ii  libvirt-daemon-lock                    11.3.0-3
ii  libvirt-daemon-plugin-lockd            11.3.0-3

Versions of packages libvirt-daemon suggests:
ii  libvirt-daemon-driver-lxc                   11.3.0-3
pn  libvirt-daemon-driver-storage-gluster       <none>
pn  libvirt-daemon-driver-storage-iscsi-direct  <none>
pn  libvirt-daemon-driver-storage-rbd           <none>
pn  libvirt-daemon-driver-storage-zfs           <none>
ii  libvirt-daemon-driver-vbox                  11.3.0-3
ii  libvirt-daemon-driver-xen                   11.3.0-3
ii  libvirt-daemon-system                       11.3.0-3

-- no debconf information

Reply via email to