----- Original Message ----- > From: "Ryan Harper" <[email protected]> > To: "Doron Fediuck" <[email protected]> > Cc: "Ryan Harper" <[email protected]>, "Sheldon" <[email protected]>, > [email protected], "Zheng Sheng ZS Zhou" > <[email protected]>, "Itamar Heim" <[email protected]>, > [email protected], "Shu Ming" > <[email protected]>, "Mark Wu" <[email protected]>, > [email protected], [email protected] > Sent: Monday, November 26, 2012 5:50:34 PM > Subject: Re: [vdsm] Review Request: Add an option to create a watchdog device. > > * Doron Fediuck <[email protected]> [2012-11-26 09:20]: > > ----- Original Message ----- > > > From: "Ryan Harper" <[email protected]> > > > To: "Doron Fediuck" <[email protected]> > > > Cc: "Sheldon" <[email protected]>, [email protected], > > > "Zheng Sheng ZS Zhou" <[email protected]>, "Itamar > > > Heim" <[email protected]>, [email protected], "Shu Ming" > > > <[email protected]>, "Mark Wu" > > > <[email protected]>, [email protected], > > > [email protected], [email protected] > > > Sent: Monday, November 26, 2012 4:01:48 PM > > > Subject: Re: [vdsm] Review Request: Add an option to create a > > > watchdog device. > > > > > > * Doron Fediuck <[email protected]> [2012-11-22 03:56]: > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Sheldon" <[email protected]> > > > > > To: "Doron Fediuck" <[email protected]> > > > > > Cc: [email protected], "Zheng Sheng ZS Zhou" > > > > > <[email protected]>, > > > > > "Itamar Heim" <[email protected]>, [email protected], > > > > > "Shu > > > > > Ming" > > > > > <[email protected]>, "Mark Wu" > > > > > <[email protected]>, > > > > > [email protected], [email protected], [email protected] > > > > > Sent: Thursday, November 22, 2012 11:00:18 AM > > > > > Subject: Re: [vdsm] Review Request: Add an option to create a > > > > > watchdog device. > > > > > > > > > On 11/21/2012 04:00 PM, Doron Fediuck wrote: > > > > > > > > > > > Currently, we do not have any plans to implement the > > > > > > > engine > > > > > > > side > > > > > > > of > > > > > > > the feature. > > > > > > > > > > > > > > > > > > But I will add a watchdog feature page to describe how > > > > > > > engine > > > > > > > enable > > > > > > > this feature. It's definitely great if any engine guy > > > > > > > would > > > > > > > like > > > > > > > to > > > > > > > take the engine part. I will be glad to provide help if > > > > > > > needed. > > > > > > > > > > > > > > > > > Hi Sheldon, > > > > > > > > > > > Any news on the engine side? > > > > > > > > > > > Currently the vdsm side is merged, while the engine side > > > > > > still > > > > > > missing. > > > > > > > > > > > The wiki page also lacks the engine side. Can you please > > > > > > handle > > > > > > it? > > > > > > > > > > > > > > Hi Doron, > > > > > > > > > I have updated the wiki page. > > > > > http://wiki.ovirt.org/wiki/Add_an_option_to_create_a_watchdog_device > > > > > And for vdsm side, I should also add a new patch to report > > > > > the > > > > > watchdog event. > > > > > > > > > I can add a flat to vm's status, so engine can poll vm's > > > > > status > > > > > to > > > > > check the event then notify the user, and let the user to > > > > > take > > > > > some > > > > > actions, such as restart or dump guest for analysis. > > > > > Perhaps event report channel is more better, but I have not > > > > > find > > > > > any > > > > > in vdsm. But it is a big work to add an event register > > > > > mechanism > > > > > for > > > > > vdsm. > > > > > > > > > what's your suggestion? > > > > > > > > > -- > > > > > Sheldon Feng(?????????) <[email protected]> IBM > > > > > Linux > > > > > Technology > > > > > Center > > > > > > > > Hi Sheldon, > > > > AFAIK, watchdog fires automatically, so no real need for user > > > > interaction > > > > when an event happens. So I'd expect the user to set the > > > > relevant > > > > action > > > > before starting the VM. Once the watchdog is triggered, it will > > > > do > > > > whatever > > > > action he has set, and notify the user. > > > > > > > > So I'd expect the user to have a list of actions for the > > > > watchdog > > > > device > > > > in the engine UI, with a default of none. The user should be > > > > able > > > > to choose > > > > which action to set when starting or editing the VM (for next > > > > run). > > > > > > I'd like to suggest we pick something other than none by default > > > since > > > we've gone through the trouble of configuring and enabling a > > > watchdog. > > > I think it's worth the discussion of what a better default > > > behavior > > > should be given access to a watchdog. > > > > > > I'd suggest that a simple reboot mode would be most useful. > > > > > > > Hi Ryan, good point. > > The reason I asked for none is exactly since someone though of it > > when writing the device actions. ie- otherwise no-op makes no > > sense, > > but as we all know no-op sometimes proves to be a much needed > > option > > if not the default one. > > In this context, a watchdog has quite an explosive potential for a > > VM. > > So for the sake of all users I'd rather ask them to specify exactly > > what should be done. Otherwise- Primum non nocere. I'm sure one day > > someone will appreciate it. > > While I understand what your saying; I think it's worth actually > walking > through all of the actions and selecting the best here. VDSM has a > role > to play here in how *best* to configure a VM. I think that a > watchdog > can elevate the usefulness of a VM by ensuring that it stays running > without user intervention. >
Ryan, you're mixing vdsm and engine. My response was to the way engine UI will present it to the user: > > > > So I'd expect the user to have a list of actions for the > > > > watchdog > > > > device > > > > in the engine UI, with a default of none. The user should be > > > > able > > > > to choose > > > > which action to set when starting or editing the VM (for next > > > > run). So this is not about vdsm, but about engine UI. As for VDSM's role on the best VM configuration, I disagree on this point. What's best for your VM will not always be best for my VM, especially when reboot is being considered. So unless there's a 100% fool-proof reason, do no harm. > As you say, having an unexpected reboot when it's not wanted can > cause > an issue, so we have at least two areas to discuss: > > 1) watchdog fidelity; does it do what it's supposed to do at the > right > time and not malfunction. This requires testing and use to validate. > Leaving the watchdog off by default will certainly reduce the amount > of > testing time. > > 2) watchdog configuration. What's the most reasonable and helpful > configuration, this includes the action as well as any variables > associated with that specific action. I think the best course here > is > to propose an initial configuration and start getting some test-time > under the configuration for validation. > Ryan, just reminding you this is an engine UI thread. As such I'd be very careful from rebooting anything as a default. This is not an audio or VGA card where you can fallback to lower resolution, this will kill your guest, with everything running in it. > If we're unwilling to enable an action by default, I'd like to have a > discussion around why that's the case. The initial objection to > always-on with action=reboot seems to be concern about the watchdog > misfiring when it shouldn't. Are their other concerns? > Yes. Googling will provide you several watchdog-related cases, which I can't quote here due to copyrights of the relevant KBs. The general idea is that one of 3 things causes WD to fire; 1. watchdog driver issues 2. Guest OS low on resources (potentially swapping), but still running 3. Host issues, such as sockets exhausted, etc. The main thing is, that in none of the above, rebooting the VM will improve the situation. If any it will make it worst. By default... > Another thought here is to think about the target guest OS type. It > may > be the case that specific actions/configurations make sense for one > OS, > but not the other[1] > > There was an engine-devel thread about libosinfo integration[2]. > > See my previous comment for relevant cases. As a default watchdog policy I'd rather be safe than sorry. Most KBs I saw would tell you to stop the watchdog service / remove the device to begin with. Then you get a bug fix. But as you probably understand, for some users this already did some damage. One more thing you need to consider is exporting and importing VMs, as well as VM templates and pools. Here as well you may get unpleasant surprise if you use a VM with a watchdog that will bite by default. > 1. > http://rwmj.wordpress.com/2010/03/03/what-is-a-watchdog/#comment-4959 > 2. > http://lists.ovirt.org/pipermail/engine-devel/2012-September/002544.html > > > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > [email protected] > > _______________________________________________ Arch mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/arch
