> On 23 Jul 2017, at 11:58, Roy Golan <[email protected]> wrote: > > On Wed, Jul 19, 2017 at 12:46 PM Francesco Romani <[email protected] > <mailto:[email protected]>> wrote: > Hi all, > > > With libvirt 3.2.0 and onwards, it seems we have now the tools to solve > https://bugzilla.redhat.com/show_bug.cgi?id=1181665 > <https://bugzilla.redhat.com/show_bug.cgi?id=1181665> > > and eventually get rid of the disk polling we do. This change is > expected to have huge impact on performance, so I'm working on it. > > > I had plans for a comprehensive refactoring in this area, but looks like > a solution backportable for 4.1.z is appealing, so I > > started with this first, saving the refactoring (which I still very much > want) for later. > > > So, quick summary: libvirt >= 3.2.0 allows to set a threshold to any > node in the backing chain of each drive of a VM > > (https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold > > <https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold>), > and fire one event exactly once > > when that threshold is crossed. The event needs to be explicitely > rearmed after. > > This is exactly what we need to get rid of polling in the steady state, > so far so good. > > > The problem is: we can't use this for some important flows we have, and > which involve usage of disks not (yet) attached to a given VM. > > > > > Possibly affected flows: > > - live storage migration: > > we use flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW | > libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT | > VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB) > > meaning that Vdsm is in charge of handling the volume > > - snapshots: > > we use snapFlags = (libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | > libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA) > > > (same meaning as above) > > - live merge: should be OK (according to a glance at the source and a > chat with Adam). > > > So looks like we will need to bridge this gap. > > > So we can still use the BLOCK_THRESHOLD event for steady state, and > avoid polling in the vast majority of the cases. > > With "steady state" I mean that the VM is running, with no > administration (snapshot, live merge, live storage migration...) > operation in progress. > > I think it is fair to assume that VMs are in this state the vast > majority of the time. > For the very important cases on which we cannot depend on events, we can > fall back to polling, but in a smarter way: > > instead of polling everything every 2s, let's just poll just the drives > involved in the ongoing operations. > > Those should be far less of the total amount of drives, and for a far > shorter time than today, so polling should be practical. > > Since the event fires once, we will need to rearm it only if the > operation is ongoing, and only just before to start it (both conditions > easy to check) > We can disable the polling on completion, or on error. This per se is > easy, but we will need a careful review of the flows, and perhaps some > safety nets in place. > > > Consider fusing polling and events into a single pipeline of events so they > can be used together. If a poll triggers an event (with distinguished origin) > then it all the handling is done in one place and it should be easy to stop > or start polling, or remove them totally. > > Anyway, should we miss to disable the polling, we will "just" have some > overhead. > > > On recovery, we will need to make sure to rearm all the relevant events, > but we can just plug in the recovery we must do already, so this should > be easy as well. > > > What is needed in order to 'rearm' it? is there an API to get the state of > event subscription? > If we lost an event how do we know to rearm it? is it idempotent to rearm? > > Remind me, do we extend a disk if the VM paused with out of space event? > > How will we handle 2 subsequent events if we didn't extend between them? > (expecting the extend to be async operation) > > So it seems to me this could fly and we can actually have the > performance benefits of events. > > > However, due to the fact that we need to review some existing and > delicate flows, I think we should still keep the current polling code > around for the next release. > > +1 > > I believe the best route is: > > 1. offer the new event-based code for 4.2, keep the polling around. > Default to events for performance > > 2. remove the polling completely in 4.3 > > > Still wonder if removing them totally is good. The absence of the events > should be supervised somehow - like in today, a failure to poll getstats of a > domain will result in a VM going unresponsive. Not the most accurate state > but at least gives some visibility. So polling should cover us where events > will fail. (similar to engine's vms monitoring)
It is a different case. With disk extensions there always is the fallback of actually hitting the 100%, pausing the VM, and triggering the extend anyway. So I do not think there is a need for another mechanism when event is missed (e.g. due to a vdsm restart). > > I'm currently working on the patches here: > https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal > > <https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal> > > > Even though the basics are in place, I don't think they are ready for > review yet. > > > Comments welcome, as usual. > > > -- > Francesco Romani > Senior SW Eng., Virtualization R&D > Red Hat > IRC: fromani github: @fromanirh > > _______________________________________________ > Devel mailing list > [email protected] <mailto:[email protected]> > http://lists.ovirt.org/mailman/listinfo/devel > <http://lists.ovirt.org/mailman/listinfo/devel> > _______________________________________________ > Devel mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
