On Wed, May 25, 2016 at 6:17 PM, David Caro <[email protected]> wrote:
> On 05/25 17:06, David Caro wrote: > > On 05/25 16:09, Barak Korren wrote: > > > On 25 May 2016 at 14:52, David Caro <[email protected]> wrote: > > > > On 05/25 14:42, Barak Korren wrote: > > > >> On 25 May 2016 at 12:44, Eyal Edri <[email protected]> wrote: > > > >> > OK, > > > >> > I suggest to test using a VM with local disk (preferably on a > host with SSD > > > >> > configured), if its working, > > > >> > lets expedite moving all VMs or at least a large amount of VMs to > it until > > > >> > we see network load reduced. > > > >> > > > > >> > > > >> This is not that easy, oVirt doesn't support mixing local disk and > > > >> storage in the same cluster, so we will need to move hosts to a new > > > >> cluster for this. > > > >> Also we will lose the ability to use templates, or otherwise have to > > > >> create the templates on each and every disk. > > > >> > > > >> The scratch disk is a good solution for this, where you can have the > > > >> OS image on the central storage and the ephemeral data on the local > > > >> disk. > > > >> > > > >> WRT to the storage architecture - a single huge (10.9T) ext4 is used > > > >> as the FS on top of the DRBD, this is probably not the most > efficient > > > >> thing one can do (XFS would probably have been better, RAW via > iSCSI - > > > >> even better). > > > > > > > > That was done >3 years ago, xfs was not quite stable and widely used > and > > > > supported back then. > > > > > > > AFAIK it pre-dates EXT4 > > > > It does, but for el6, it was performing way poorly, and with more bugs > (for > > what the reviews of it said at the time). > > > > > in any case this does not detract from the > > > fact that the current configuration in not as efficient as we can make > > > it. > > > > > > > It does not, I agree to better focus on what we can do now on, now what > should > > have been done then. > > > > > > > > >> > > > >> I'm guessing that those 10/9TB are not made from a single disk but > > > >> with a hardware RAID of some sort. In this case deactivating the > > > >> hardware RAID and re-exposing it as multiple separate iSCSI LUNs > (That > > > >> are then re-joined to a single sotrage domain in oVirt) will enable > > > >> different VMs to concurrently work on different disks. This should > > > >> lower the per-vm storage latency. > > > > > > > > That would get rid of the drbd too, it's a totally different setup, > from > > > > scratch (no nfs either). > > > > > > We can and should still use DRBD, just setup a device for each disk. > > > But yeah, NFS should probably go away. > > > (We are seeing dramatically better performance for iSCSI in > > > integration-engine) > > > > I don't understand then what you said about splitting the hardware > raids, you > > mean to setup one drdb device on top of each hard drive instead? > > > Though I really think we should move to gluster/ceph instead for the > jenkins > vms, anyone knows what's the current status of the hyperconverge? > > Neither Gluster nor Hyper-converge I think is stable enough to move all production infra into. Hyperconverged is not supported yet for oVirt as well (might be a 4.x feature) > That would allow us for better scalable distributed storage, and properly > use > the hosts local disks (we have more space on the combined hosts right now > that > on the storage servers). > I agree a stable distributed storage solution is the way to go if we can find one :) > > > > > > > btw. I think that the nfs is used also for something more than just the > engine > > storage domain (just to keep it in mind that it has to be checked if we > are > > going to get rid of it) > > > > > > > > > > > > >> > > > >> Looking at the storage machine I see strong indication it is IO > bound > > > >> - the load average is ~12 while there are just 1-5 working processes > > > >> and the CPU is ~80% idle and the rest is IO wait. > > > >> > > > >> Running 'du *' at: > > > >> > /srv/ovirt_storage/jenkins-dc/658e5b87-1207-4226-9fcc-4e5fa02b86b4/images > > > >> one can see that most images are ~40G in size (that is _real_ 40G > not > > > >> sparse!). This means that despite having most VMs created based on > > > >> templates, the VMs are full template copies rather then COW clones. > > > > > > > > That should not be like that, maybe the templates are wrongly > configured? or > > > > foreman images? > > > > > > This is the expected behaviour when creating a VM from template in the > > > oVirt admin UI. I thought Foreman might behave differently, but it > > > seems it does not. > > > > > > This behaviour is determined by the parameters you pass to the engine > > > API when instantiating a VM, so it most probably doesn't have anything > > > to do with the template configuration. > > > > So maybe a misconfiguration in foreman? > > > > > > > > > > > > >> What this means is that using pools (where all VMs are COW copies of > > > >> the single pool template) is expected to significantly reduce the > > > >> storage utilization and therefore the IO load on it (the less you > > > >> store, the less you need to read back). > > > > > > > > That should happen too without pools, with normal qcow templates. > > > > > > Not unless you create all the VMs via the API and pass the right > > > parameters. Pools are the easiest way to ensure you never mess that > > > up... > > > > That was the idea > > > > > > > > > And in any case, that will not lower the normal io, when not actually > > > > creating vms, as any read and write will still hit the disk anyhow, > it > > > > only alleviates the io when creating new vms. > > > > > > Since you are reading the same bits over and over (for different VMs) > > > you enable the various buffer caches along the way (in the storage > > > machines and in the hypevirsors) to do what they are supposed to. > > > > > > Once the vm is started, mostly all that's needed is on ram, so there are > not > > that much reads from disk, unless you start writing down to it, and > that's > > mostly what we are hitting, lots of writes. > > > > > > > > > The local disk (scratch disk) is the best option > > > > imo, now and for the foreseeable future. > > > > > > This is not an either/or thing, IMO we need to do both. > > > > I think that it's way more useful, because it will solve our current > issues > > faster and for longer, so IMO it should get more attention sooner. > > > > Any improvement that does not remove the current bottleneck, is not > really > > giving any value to the overall infra (even if it might become valuable > later). > > > > > > > > -- > > > Barak Korren > > > [email protected] > > > RHEV-CI Team > > > > -- > > David Caro > > > > Red Hat S.L. > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > Tel.: +420 532 294 605 > > Email: [email protected] > > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > > Web: www.redhat.com > > RHT Global #: 82-62605 > > > > -- > David Caro > > Red Hat S.L. > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > Tel.: +420 532 294 605 > Email: [email protected] > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > Web: www.redhat.com > RHT Global #: 82-62605 > -- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
