Package: libvirt-daemon
Version: 11.3.0-3
Severity: important
Tags: security
X-Debbugs-Cc: [email protected], Debian Security Team
<[email protected]>
Context: quick review of what happens if you 'virsh snapshot-delete' an
external disk snapshot when the virtual disk has the properties discard='unmap'
and detect_zeroes='off':
# # We're going to make this disk image small so we don't have a Bad Day
# qemu-img create -f qcow2 /var/lib/libvirt/images/trogdor.qcow2 5G
# # [put a base install of Debian or similar into this disk image]
# grep -A3 "device='disk'" trogdor.xml
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' discard='unmap' detect_zeroes='off'/>
<source file='/var/lib/libvirt/images/trogdor.qcow2'/>
<target dev='vda' bus='virtio'/>
# virsh define trogdor.xml
Domain 'trogdor' defined from trogdor.xml
# virsh start trogdor
Domain 'trogdor' started
# # Make a few snapshots and have the guest TRIM the filesystem in the second
one
# N=1; virsh snapshot-create-as --domain trogdor snap$N --diskspec
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap1 created
# N=2; virsh snapshot-create-as --domain trogdor snap$N --diskspec
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap2 created
# virsh domfstrim trogdor
# N=3; virsh snapshot-create-as --domain trogdor snap$N --diskspec
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap3 created
# # Now delete the second one
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G /var/lib/libvirt/images/trogdor.qcow2
324K /var/lib/libvirt/images/trogdor-snap1.qcow2
2.1M /var/lib/libvirt/images/trogdor-snap2.qcow2
456K /var/lib/libvirt/images/trogdor-snap3.qcow2
# virsh snapshot-delete trogdor snap2
Domain snapshot snap2 deleted
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G /var/lib/libvirt/images/trogdor.qcow2
3.0G /var/lib/libvirt/images/trogdor-snap1.qcow2
456K /var/lib/libvirt/images/trogdor-snap3.qcow2
Oops. The blockcommit implicit in deleting the snapshot turned all the TRIM'd
zeroes into actual zeros because detect_zeroes is off. Which is probably a bug
in itself; it would make sense to have discard='unmap' set
detect_zeroes='unmap' during blockcommit/blockpull even if it doesn't the rest
of the time. This is obviously a much bigger problem when VMs have 3TB of free
space instead of 3GB. We must never do that. Whereas this is what happens (or
should happen) if you do the same thing with detect_zeroes='unmap':
# grep -A3 "device='disk'" trogdor.xml
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' discard='unmap' detect_zeroes='unmap'/>
<source file='/var/lib/libvirt/images/trogdor.qcow2'/>
<target dev='vda' bus='virtio'/>
# virsh define trogdor.xml
Domain 'trogdor' defined from trogdor.xml
# virsh start trogdor
Domain 'trogdor' started
# # Make a few snapshots and have the guest TRIM the filesystem in the second
one
# N=1; virsh snapshot-create-as --domain trogdor snap$N --diskspec
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap1 created
# # IT said try turning it off and on again
# virsh shutdown trogdor
Domain 'trogdor' is being shutdown
# # wait for it to shut down
# virsh start trogdor
Domain 'trogdor' started
# N=2; virsh snapshot-create-as --domain trogdor snap$N --diskspec
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap2 created
# virsh domfstrim trogdor
# N=3; virsh snapshot-create-as --domain trogdor snap$N --diskspec
vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap3 created
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G /var/lib/libvirt/images/trogdor.qcow2
5.4M /var/lib/libvirt/images/trogdor-snap1.qcow2
3.9M /var/lib/libvirt/images/trogdor-snap2.qcow2
200K /var/lib/libvirt/images/trogdor-snap3.qcow2
# virsh snapshot-delete trogdor snap2
Domain snapshot snap2 deleted
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G /var/lib/libvirt/images/trogdor.qcow2
5.8M /var/lib/libvirt/images/trogdor-snap1.qcow2
1.6M /var/lib/libvirt/images/trogdor-snap3.qcow2
Now the merged snapshot is a much more appropriate size. Notice the VM shutdown
and start between creating the first snapshot and the second. That's to work
around the bug, because the bug is that detect_zeroes isn't enabled when it
ought to be. It's related to these lines in qemuDomainPrepareDiskSourceData()
in src/qemu/qemu_domain.c:
/* transfer properties valid only for the top level image */
if (src == disk->src || src == disk->src->dataFileStore)
src->detect_zeroes = disk->detect_zeroes;
The conditional here might have been crafted from a place of love, but it ain't
right. The function is meant to copy properties from 'disk' to 'src'. What the
detect_zeroes property does is check if data being written is all zeroes and
then if it is, uses unmap instead of actually writing the zeroes. Typically
writes only go to the top-level image so if none of the others ever get written
to then they don't need detect_zeroes to be enabled, right? That's the theory?
There are three problems with this. The first is that it's apparently
fruitless. Enabling detect_zeroes is somewhat computationally expensive so the
theory might be to turn it off for read-only images where it isn't needed, but
if they're actually read-only then they shouldn't be getting any writes anyway
and then there shouldn't be any computation to be saved by turning it off.
The second is that the backing files aren't always read-only, and in particular
live blockcommit writes to them and very much wants detect_zeroes='unmap'. This
is where you get the denial of service, because it happens during live snapshot
deletion. A user could create arbitrarily many snapshots of trivial size and
TRIM them without exceeding a storage quota and the disk usage would only
increase by a factor of a thousand when deleting, which renders adversarial the
operation that should normally be expected to free up space. Your storage gets
filled with zeroes even when you set detect_zeroes='unmap' and your VMs crash
because your host filesystem is completely full.
The third is that the check has a TOCTOU fail which is triggered during live
snapshot creation. It seems to do what it claims to during guest boot but
during snapshot creation that function is called against a new snapshot which
is about to be the top level image but 'disk->src' isn't set to 'src' until
qemuSnapshotDiskUpdateSource() which is after the call to
qemuDomainPrepareDiskSourceData(). With the result that detect_zeroes for what
is about to be the top-level image is left as its default value, which is
'off'. So detect_zeroes gets erroneously disabled for the live top-level image
whenever you create a live snapshot. It also stays enabled for what was the
top-level image when the guest booted. This is why snapshot-delete "worked"
after shutting down the guest so it could boot with the top-level snapshot as
the one we would ultimately merge the deleted one into -- detect_zeroes at that
point was still 'unmap' for snap1 instead of the then-top level snap3.
All of this seems to be sorted by removing the conditional in
qemuDomainPrepareDiskSourceData() so that 'src->detect_zeroes =
disk->detect_zeroes;' happens unconditionally. That and updating a few tests in
tests/qemublocktestdata/xml2json to expect detect_zeroes to be present in the
relevant JSON causes detect_zeroes='unmap' to work as expected with
blockcommit/snapshot-delete.
If for some reason that change isn't desirable, the most irksome of the trouble
could be eliminated by having qemuBlockCommit(), and therefore snapshot-delete,
temporarily set detect_zeroes='unmap' on the commit target destination image
whenever discard='unmap' is set. This would be good to do even if the
conditional in qemuDomainPrepareDiskSourceData() *is* removed, because
blockcommit causes disk images to explode with zeroes whenever discard='unmap'
and detect_zeroes isn't, and that isn't desirable even if you don't want to
constantly pay the cost of enabling detect_zeroes on the live top image. Or to
put it another way, you shouldn't have to enable it on the live top image if
you only want it during blockcommit/snapshot-delete, but you will want it
during blockcommit whenever discard is enabled.
-- System Information:
Debian Release: 13.1
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.12.48+deb13-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libvirt-daemon depends on:
ii libc6 2.41-12
ii libgcc-s1 14.2.0-19
ii libglib2.0-0t64 2.84.4-3~deb13u1
ii libtirpc3t64 1.3.6+ds-1
ii libvirt-common 11.3.0-3
ii libvirt-daemon-common 11.3.0-3
ii libvirt-daemon-log 11.3.0-3
ii libvirt0 11.3.0-3
ii libxml2 2.12.7+dfsg+really2.9.14-2.1+deb13u1
ii logrotate 3.22.0-1
Versions of packages libvirt-daemon recommends:
ii libvirt-daemon-driver-interface 11.3.0-3
ii libvirt-daemon-driver-network 11.3.0-3
ii libvirt-daemon-driver-nodedev 11.3.0-3
ii libvirt-daemon-driver-nwfilter 11.3.0-3
ii libvirt-daemon-driver-qemu 11.3.0-3
ii libvirt-daemon-driver-secret 11.3.0-3
ii libvirt-daemon-driver-storage 11.3.0-3
ii libvirt-daemon-driver-storage-disk 11.3.0-3
ii libvirt-daemon-driver-storage-iscsi 11.3.0-3
ii libvirt-daemon-driver-storage-logical 11.3.0-3
ii libvirt-daemon-driver-storage-mpath 11.3.0-3
ii libvirt-daemon-driver-storage-scsi 11.3.0-3
ii libvirt-daemon-lock 11.3.0-3
ii libvirt-daemon-plugin-lockd 11.3.0-3
Versions of packages libvirt-daemon suggests:
ii libvirt-daemon-driver-lxc 11.3.0-3
pn libvirt-daemon-driver-storage-gluster <none>
pn libvirt-daemon-driver-storage-iscsi-direct <none>
pn libvirt-daemon-driver-storage-rbd <none>
pn libvirt-daemon-driver-storage-zfs <none>
ii libvirt-daemon-driver-vbox 11.3.0-3
ii libvirt-daemon-driver-xen 11.3.0-3
ii libvirt-daemon-system 11.3.0-3
-- no debconf information