On Wed, Oct 25, 2017 at 4:12 PM, Jiri Denemark <[email protected]> wrote:
> On Tue, Oct 24, 2017 at 10:34:53 -0700, Prerna Saxena wrote: > > > > As noted in > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > libvirt-QEMU driver handles all async events from the main loop. > > Each event handling needs the per-VM lock to make forward progress. In > > the case where an async event is received for the same VM which has an > > RPC running, the main loop is held up contending for the same lock. > > > > This impacts scalability, and should be addressed on priority. > > > > Note that libvirt does have a 2-step deferred handling for a few event > > categories, but (1) That is insufficient since blockign happens before > > the handler could disambiguate which one needs to be posted to this > > other queue. > > (2) There needs to be homogeniety. > > > > The current series builds a framework for recording and handling VM > > events. > > It initializes per-VM event queue, and a global event queue pointing to > > events from all the VMs. Event handling is staggered in 2 stages: > > - When an event is received, it is enqueued in the per-VM queue as well > > as the global queues. > > - The global queue is built into the QEMU Driver as a threadpool > > (currently with a single thread). > > - Enqueuing of a new event triggers the global event worker thread, which > > then attempts to take a lock for this event's VM. > > - If the lock is available, the event worker runs the function > handling > > this event type. Once done, it dequeues this event from the global > > as well as per-VM queues. > > - If the lock is unavailable(ie taken by RPC thread), the event > worker > > thread leaves this as-is and picks up the next event. > > If I get it right, the event is either processed immediately when its VM > object is unlocked or it has to wait until the current job running on > the VM object finishes even though the lock may be released before that. > Correct? If so, this needs to be addressed. > In most cases, the lock is released just before we end the API. However, it is a small change that can be made. > > > - Once the RPC thread completes, it looks for events pertaining to the > > VM in the per-VM event queue. It then processes the events serially > > (holding the VM lock) until there are no more events remaining for > > this VM. At this point, the per-VM lock is relinquished. > > > > Patch Series status: > > Strictly RFC only. No compilation issues. I have not had a chance to > > (stress) test it after rebase to latest master. > > Note that documentation and test coverage is TBD, since a few open > > points remain. > > > > Known issues/ caveats: > > - RPC handling time will become non-deterministic. > > - An event will only be "notified" to a client once the RPC for same VM > completes. > > - Needs careful consideration in all cases where a QMP event is used to > > "signal" an RPC thread, else will deadlock. > > This last issue is actually a show stopper here. We need to make sure > QMP events are processed while a job is still active on the same domain. > Otherwise thinks kile block jobs and migration, which are long running > jobs driven by events, will break. > > Jirka > Completely agree, which is why I have explicitly mentioned this. However, I do not completely follow why it needs to be this way. Can the block job APIs between QEMU <--> libvirt be fixed so that such behaviour is avoided ? Regards, Prerna
-- libvir-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/libvir-list
