On Wed, 2010-04-07 at 15:39 +0200, Andrew Beekhof wrote: > I increased the timeout > > even further (to 120s instead of the minimum recommended 60) and it > > seems to be working. Curious though, because when it does work, the logs > > show that the entire stop operation, including a live migration, takes > > only about 7 seconds. > > It depends on what else the machine is doing. > Are there any other Xen instances that might be migrating too?
The test cluster currently has two Xen VM's, one is tied to a particular DRBD volume, so it has colocation and order constraints so that it must shut down, wait for the DRBD/Filesystem/LVM stack to fail over, and restart. Still, even that doesn't take more than 60 seconds. The other VM is stored on an NFS volume so that it can live migrate (allow-migrate="true"). I have seen failures of the stop operation on both of them prior to increasing the timeout. Surely it's not handling the resources sequentially? That will be a disaster if we get to where I want to be going, which may involve dozens or even hundreds of VMs on a cluster. I realize I may have to adjust the timeout up higher for the simple reason that a few dozen VM's shutting down in parallel is going to take longer than one or two in parallel due to sharing of host OS resources, but hopefully the timeout won't be a linear function of the number of VMs. --Greg _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
