On Tue, Jun 18, 2013 at 12:00:07PM +0800, Mei EL Liu wrote: > Hi, > > I found that its a little difficult to detect IO bandwidth > congestion in ovirt storage domains supported by NFS or GlusterFs. > > For block based storage, it's easier to detect, since you can use > some tool like iostat.For the case of file system based storage, > it's much harder. > > I investigate the existing solution. vsphere uses average IO latency > to detect it. I propose a similar scheme in > http://www.ovirt.org/Features/Design/SLA_for_storage_io_bandwidth . > It simplifies the scheme by make the congestion decision on a single > host instead of using the statistics from all the hosts use the > backend storage. It doesn't need communication between hosts and > maybe in phase two we can add communication and make a global > decision. > > For now, it detects congestion viastatistics of vms using that > backend storage in the local host(This info is collected through > iostat in vm). It collects IO latency in such vms and compute an > average latency for that backend storage. If it is higher than > threshold, a congestion is found. > > However, when I did testing for such a policy, I found that setting > IO limit to a smaller value will make latency longer. That means if > average latency exceeds that threshold and then our automatically > tuning IO limit will be decreased which lead to average IO latency > longer. Of course, this IO latency will exceed the threshold again > and cause the IO limit be decreased. This will finally cause the IO > limit to its lower bound. > > This scheme has affected by the following reasons: > 1. we collect statistic data from vms instead of host. (This is > because it is hard to collect such info for remote storage like NFS, > GlusterFS) > 2.The IO limit affect the latency. > 3. The threshold is a constant. > 4 I also find that iostat's await(latency info) is not good enough > since the latency is long for very light io or very heavy IO. > > > Does anybody get an idea or have experience on this? Suggestions are > more than welcome. Thanks in advance.
Thank you for such a thorough introduction to this topic. One thought I had is that maybe we need to invert your logic with respect to IO throttling. The way it could work is: 1) At the datacenter level, we establish a throughput range. VMs are guaranteed the minimum and won't exceed the maximum. Similarly, we set a range for latency. 2) Hosts continuously reduce allowable bandwidth for VMs down to the throughput minimum. If latency rises above the allowable limit for a single VM slowly increase bandwidth up to allowable maximum. 3) If over time, the IO remains congested you can: a) Decrease the cluster-wide throughput minimum and maximum values b) Increase the maximum allowable latency c) Migrate VM disks to alternate storage d) Upgrade Storage -- Adam Litke <[email protected]> IBM Linux Technology Center _______________________________________________ Arch mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/arch
