Am 18.12.18 um 11:48 schrieb Hector Martin:
> On 18/12/2018 18:28, Oliver Freyermuth wrote:
>> We have yet to observe these hangs, we are running this with ~5 VMs with ~10 
>> disks for about half a year now with daily snapshots. But all of these VMs 
>> have very "low" I/O,
>> since we put anything I/O intensive on bare metal (but with automated 
>> provisioning of course).
>>
>> So I'll chime in on your question, especially since there might be VMs on 
>> our cluster in the future where the inner OS may not be running an agent.
>> Since we did not observe this yet, I'll also add: What's your "scale", is it 
>> hundreds of VMs / disks? Hourly snapshots? I/O intensive VMs?
> 
> 5 hosts, 15 VMs, daily snapshots. I/O is variable (customer workloads); 
> usually not that high, but it can easily peak at 100% when certain things 
> happen. We don't have great I/O performance (RBD over 1gbps links to HDD 
> OSDs).
> 
> I'm poring through monitoring graphs now and I think the issue this time 
> around was just too much dirty data in the page cache of a guest. The VM that 
> failed spent 3 minutes flushing out writes to disk before its I/O was 
> quiesced, at around 100 IOPS throughput (the actual data throughput was low, 
> though, so small writes). That exceeded our timeout and then things went 
> south from there.
> 
> I wasn't sure if fsfreeze did a full sync to disk, but given the I/O behavior 
> I'm seeing that seems to be the case. Unfortunately coming up with an upper 
> bound for the freeze time seems tricky now. I'm increasing our timeout to 15 
> minutes, we'll see if the problem recurs.
> 
> Given this, it makes even more sense to just avoid the freeze if at all 
> reasonable. There's no real way to guarantee that a fsfreeze will complete in 
> a "reasonable" amount of time as far as I can tell.

Potentially, if granted arbitrary command execution by the guest agent, you 
could check (there might be a better interface than parsing meminfo...):
  cat /proc/meminfo | grep -i dirty
  Dirty:             19476 kB
You could guess from that information how long the fsfreeze may take (ideally, 
combining that with allowed IOPS). 
Of course, if you have control over your VMs, you may also play with the 
vm.dirty_ratio and vm.dirty_background_ratio. 

Interestingly, tuned on CentOS 7 configures for a "virtual-guest" profile:
vm.dirty_ratio = 30
(default is 20 %) so they optimize for performance by increasing the dirty 
buffers to delay writeback even more. 
They take the opposite for their "virtual-host" profile:
vm.dirty_background_ratio = 5
(default is 10 %). 
I believe these choices are good for performance, but may increase the time it 
takes to freeze the VMs, especially if IOPS are limited and there's a lot of 
dirty data. 

Since we also have 1 Gbps links and HDD OSDs, and plan to add more and more VMs 
and hosts, we may also observe this one day... 
So I'm curious:
How did you implement the timeout in your case? Are you using a 
qemu-agent-command issuing fsfreeze with --async and --timeout instead of 
domfsfreeze? 
We are using domfsfreeze as of now, which (probably) has an infinite timeout, 
or at least no timeout documented in the manpage. 

Cheers,
        Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to