On 26 Jan 2010, at 12:00 pm, Jonathan Aquilina wrote:

does anyone have any benchmarks for I/O in a virtualized cluster?

I don't have formal benchmarks, but I can tell you what I see on my VMware virtual machines in general:

Network I/O is reasonably fast - there's some additional latency, but nothing particularly severe. VMware can special-case communication between VMs on the same physical host, if required, but that reduces flexibility in moving the VMs around.

Disk I/O is fairly poor, especially once the number of virtual machines becomes large. This is hardly surprising - the VMs are contending for shared resources, and there's bound to be more contention in a virtualised setup than in physical machines.

In our case (~170 virtual machines running on 9 physical servers, each of which has dual GigE for VM traffic and dual port fibrechannel)

Forgive me for using VMware parlance rather than Xen, but hopefully the ideas will be the same. Here are a few things I've noted:

1) Applications with I/O patterns of large numbers of small disk operations are particularly painful (such as our ganglia server with all its thousands of tiny updates to RRD files). We've mitigated this by configuring Linux on this guest to allow a much larger proportion of dirty pages than usual, and to not flush to disk quite so often. OK, so I risk losing more data if the VM goes pop, but this is just ganglia graphing, so I don't really care too much in that particular case.

2) Raw device maps (where you pass a LUN straight through to a single virtual machine, rather than carving the disk out of a datastore) reduce contention and increase performance somewhat, at the cost of using up device minor numbers on ESX quite quickly; because ESX is basically Linux, you're limited to 256 (I think - it might be 128) LUNs presented to each host, and probably to each cluster, since VMs need to be able to migrate. I basically use RDMs for database applications where the storage requirements are greater than about 500 GB. For less than that I use datastores.

3) Keep the number of virtual machines per datastore quite low, especially if the applications are I/O heavy, to reduce contention.

4) In an ideal world I'd spread the datastores over a larger number of RAID units than I currently have, but my budget can't stand that.

All this is rather dependent of course on what technology you're using to provide storage to your virtual machines. We're using fibrechannel, but of course mileage may vary considerably if you use NAS or iSCSI, and depending on how many NICs you're bonding together to get bandwidth.




--
The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to