Dear Hector,

we are using the very same approach on CentOS 7 (freeze + thaw), but preceeded 
by an fstrim. With virtio-scsi, using fstrim propagates the discards from 
within the VM to Ceph RBD (if qemu is configured accordingly),
and a lot of space is saved.

We have yet to observe these hangs, we are running this with ~5 VMs with ~10 disks for 
about half a year now with daily snapshots. But all of these VMs have very 
"low" I/O,
since we put anything I/O intensive on bare metal (but with automated 
provisioning of course).

So I'll chime in on your question, especially since there might be VMs on our 
cluster in the future where the inner OS may not be running an agent.
Since we did not observe this yet, I'll also add: What's your "scale", is it 
hundreds of VMs / disks? Hourly snapshots? I/O intensive VMs?

Cheers,
        Oliver

Am 18.12.18 um 10:10 schrieb Hector Martin:
Hi list,

I'm running libvirt qemu guests on RBD, and currently taking backups by issuing 
a domfsfreeze, taking a snapshot, and then issuing a domfsthaw. This seems to 
be a common approach.

This is safe, but it's impactful: the guest has frozen I/O for the duration of 
the snapshot. This is usually only a few seconds. Unfortunately, the freeze 
action doesn't seem to be very reliable. Sometimes it times out, leaving the 
guest in a messy situation with frozen I/O (thaw times out too when this 
happens, or returns success but FSes end up frozen anyway). This is clearly a 
bug somewhere, but I wonder whether the freeze is a hard requirement or not.

Are there any atomicity guarantees for RBD snapshots taken *without* freezing 
the filesystem? Obviously the filesystem will be dirty and will require journal 
recovery, but that is okay; it's equivalent to a hard shutdown/crash. But is 
there any chance of corruption related to the snapshot being taken in a 
non-atomic fashion? Filesystems and applications these days should have no 
trouble with hard shutdowns, as long as storage writes follow ordering 
guarantees (no writes getting reordered across a barrier and such).

Put another way: do RBD snapshots have ~identical atomicity guarantees to e.g. 
LVM snapshots?

If we can get away without the freeze, honestly I'd rather go that route. If I 
really need to pause I/O during the snapshot creation, I might end up resorting 
to pausing the whole VM (suspend/resume), which has higher impact but also 
probably a much lower chance of messing up (or having excess latency), since it 
doesn't involve the guest OS or the qemu agent at all...



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to