Just wanted to ping on this again. I'm confident enough in it if widow doesn't get a chance to respond. On Mar 13, 2013 1:27 PM, "Marcus Sorensen" <shadow...@gmail.com> wrote:
> Ok, the fix for CLOUDSTACK-600 is the first patch mentioned plus two > fixes to it. The first additional patch handles upgrades, and the > second one is a minor fix for when we shut down a non-persistent VM > that will avoid printing useless scary messages (we throw an error in > the timeout loop that waits for the vm to shut down when it succeeds, > because the domain can't be found). > > I've tested this patch set both in a fresh 4.1 install and a fresh > master install, and both were able to start/stop VMs successfully. > When upgrading an existing 4.1 install (without the patches), the VMs > were undefined on shutdown as they should be. > > commit 5dfcd309f10e5bd6a918f7fdff3f44a3dff2374a > Author: Wido den Hollander <w...@widodh.nl> > Date: Thu Feb 7 22:58:20 2013 +0100 > > agent: Do not define domains persistent in libvirt > > We used to define domains persistent in libvirt, which caused XML > definitionsto stay there after a reboot of the hypervisor. > > We however don't do anything with those already defined domains, > actually, we wipe all defined domains when starting the agent. > > Some users however reported that libvirt started these domains > after a reboot before the CloudStack agent was started. > > By starting domains from the XML description and not defining them > we prevent them from ever being stored in libvirt. > > commit 8d7d1cd5623a6744a954fb35eeff8408d6381083 > Author: Marcus Sorensen <mar...@betterservers.com> > Date: Wed Mar 13 11:10:56 2013 -0600 > > Summary: KVM - undefine persistent VMs on stop > > Detail: A previous patch fixed an issue where we are defining VMs to > persist > locally on KVM hosts, which can cause issues if the agent isn't > running and > libvirt decides to start the VM unbeknownst to cloudstack. The > previous patch > stopped defining VMs as persistent. This patch adds compatibility > for existing > cloudstack environments, removing the persistent definition on > stop if needed. > > BUG-ID: CLOUDSTACK-600 > > commit 97d2e3fe7772fa01941295397f9d59d35cf47671 > Author: Marcus Sorensen <mar...@betterservers.com> > Date: Wed Mar 13 12:57:46 2013 -0600 > > Summary: KVM - remove harmless message about domain not found on VM > stop > > Detail: When we stop a VM, it's definition is no longer valid. > Therefore, we > need to catch the exception thrown from libvirt in trying to lookup a > non-existent domain by UUID while trying to check if it's shut down. > > BUG-ID:CLOUDSTACK-600 > Signed-off-by: Marcus Sorensen <mar...@betterservers.com> 1363201066 > -0600 > > > On Tue, Mar 12, 2013 at 11:49 PM, Marcus Sorensen <shadow...@gmail.com> > wrote: > > Looks like this patch might need to be adjusted slightly. We can use > > the domain.isPersistent() method to determine if we need to call > > domain.undefine() in addition to dm.shutdown and dm.destroy. That will > > allow upgrades to clean up existing definitions as VMs are shutdown. > > If you don't have a chance to get to that, I'll try to tomorrow. > > > > On Tue, Mar 12, 2013 at 7:23 PM, Marcus Sorensen <shadow...@gmail.com> > wrote: > >> To clarify, this is a fairly serious bug, and it should probably be > >> addressed, I'm just wanting clarification that the patch is good to go > >> as a fix. On my way home I was thinking about how I'm not sure if it > >> addresses upgrades, i.e. does it still allow existing definitions to > >> be wiped, or will those be forever stuck. I haven't even looked yet. > >> > >> Essentially, the bug is that if a KVM host is uncleanly shut down, and > >> then brought back up without starting the agent, the VMs that were > >> running on the system will be started by libvirt/startup scripts, > >> which can be disastrous for shared-storage VMs. So if a host goes > >> down without being put into maintenance mode, for whatever reason, and > >> the agent is either slow to start, the agent fails to start for some > >> reason, or auto-start is disabled for some reason, you're almost > >> guaranteed to have two VMs running off of the same shared storage > >> image. > >> > >> We don't want to make any VMs permanently known to a host, and libvirt > >> provides a simple way to define a VM domain without making it > >> permanent. > >> > >> On Tue, Mar 12, 2013 at 6:42 PM, Chip Childers > >> <chip.child...@sungard.com> wrote: > >>> On Tue, Mar 12, 2013 at 06:39:09PM -0600, Marcus Sorensen wrote: > >>>> This addresses CLOUDSTACK-600. Please get buyoff from Wido if > >>>> possible, since it's his patch. I've reviewed and tested it, but I > >>>> want to make sure I'm not wrong in assuming that this should be in 4.1 > >>>> > >>>> commit 5dfcd309f10e5bd6a918f7fdff3f44a3dff2374a > >>>> Author: Wido den Hollander <w...@widodh.nl> > >>>> Date: Thu Feb 7 22:58:20 2013 +0100 > >>>> > >>>> agent: Do not define domains persistent in libvirt > >>>> > >>>> We used to define domains persistent in libvirt, which caused XML > >>>> definitions > >>>> to stay there after a reboot of the hypervisor. > >>>> > >>>> We however don't do anything with those already defined domains, > >>>> actually, we wipe > >>>> all defined domains when starting the agent. > >>>> > >>>> Some users however reported that libvirt started these domains > >>>> after a reboot > >>>> before the CloudStack agent was started. > >>>> > >>>> By starting domains from the XML description and not defining them > >>>> we prevent > >>>> them from ever being stored in libvirt. > >>>> > >>> > >>> Wido - I'll let you comment before we apply this. >