----- Original Message ----- > From: "Dan Kenigsberg" <[email protected]> > To: "Francesco Romani" <[email protected]> > Cc: "vdsm-devel" <[email protected]>, [email protected] > Sent: Tuesday, April 8, 2014 6:57:15 PM > Subject: Re: cleaning statistics retrieval > > On Tue, Apr 08, 2014 at 06:52:50AM -0400, Francesco Romani wrote: > > Hello VDSM developers, > > Please use devel@ovirt, and mention "vdsm" at the subject. This thread > in particular involves Engine as well.
Right, I forgot. Sorry about that. > > I'd like to discuss and explain the plans for cleaning up Vm.getStats() > > in vdsm/virt/vm.py, and how it affects a bug we have: > > https://bugzilla.redhat.com/show_bug.cgi?id=1073478 > > > > Let's start from the bug. > > > > To make a long story short, there is a (small) race in VDSM, probably > > introduced by commit > > 8fedf8bde3c28edb07add23c3e9b72681cb48e49 > > The race can actually be triggered only if the VM is shutting down, so it > > is not that bad. > > > > Fixing the reported issue in the specific case can be done with a trivial > > if, and that it what I did > > in my initial http://gerrit.ovirt.org/#/c/25803/1/vdsm/vm.py,cm > > Could you explain how an AttributeError there moved the VM to Down? This should actually be this bug of engine https://bugzilla.redhat.com/show_bug.cgi?id=1072282 if GetVmStats fails for whatever reason, engine thinks the VM is down. > > And this is the core of the issue. > > My initial idea is to either return success and a complete, well formed > > statistics set, or return an error. > > However current engine seems to not cope properly with this, and we cannot > > break backward compatibility. > Would you be more precise? If getAllVmStats returns an errCode for one > of the Vms, what happens? Of course. For GetAllVmStats, AFAIK, but please correct me if I am wrong, because I'm not really expert on engine side, engine simply does not expects anything different from a list of either a RunningVmStats or an ExitedVmStats. So not sure (will verify just after this mail) if engine can cope with mixed content, meaning [ stats, errCode, stats, stats ... ] For GetVmStats, like Michal said, the engine expects the call to succeed otherwise it puts the VM into the Unknown state. > > > > > Looks like the only way to go is to always return success and to add a > > field to describe the content of the > > statistics (partial, complete...). THis is admittedly a far cry from the > > ideal solution, but it is dictated > > by the need to preserve the compatibility with current/old engines. > > I don't think that I understand your suggestion, but it does not sound > right to send a partial dictionary and to "appologize" about its > paritiality elsewhere. The dictionary may be "partial" for engine-4.0 > yet "complete" for engine-3.5. It's not for Vdsm to grade its own > output. I see your point (that's one of the reasons I'm not happy about this solution). Please see below for the detauls. > > please note that I'm not really happy about this solution, but, given the > > constraint, I don't see better alternatives. > > Please explain the benefits of describing the partial content, as I do > not see them. The root issue here is the getStats must always succeed, because the engine doesn't expect otherwise and thus we guarantee this to cope with old engines; but inside VDSM, getStats can actually fail in a lot of places (like in this case getBalloonInfo) So, in VDSM we can end up with a partial response, and now the question is: what to do with this partial response? And if there are optional fields in the response, how can the engine distinguish between * full response, but with an optional field missing and * partial response (because of an exception inside VDSM), without some field, incidentally including an optional one ? The VDSM 'grading' was an hint from VDSM to help engine to distinguish between those cases. Even if we agree this grading idea is bad, the core issue remains open: what to do if we end up with a partial response? For example, let's say we handle the getBalloonInfo exception (http://gerrit.ovirt.org/#/c/26539/), the stats object to be returned will lack * the (mandatory, expected) balloon stats * the (optional) migration progress - ok, bad example because this makes sense only during migrations, but other optional fields may be added later and the issue remains Again, anyone feel free to correct me if I misunderstood something about engine (or VDSM <=> engine communication) and to suggest better alternatives :\ Thanks and bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
