On 2011-07-29 10:22, Michael Schwartzkopff wrote: > Hi, > > I hope I found the correct list. Playing with the VirtualDomain RA I found > two > problems. Please find the description and patches below.
Sorry for not tending to this for a while, and thanks to Dejan for the reminder. > 1) During stop operation libvirt occasionally returns an error because the > state cannot be determined just the moment the machine is shut down. This > patch makes the RA try to get the state again one time. If the machine is > down > then everything is OK. > > --- /root/VirtualDomain 2011-07-29 08:39:30.652675972 +0200 > +++ /usr/lib/ocf/resource.d/heartbeat/VirtualDomain 2011-07-29 > 10:08:24.712790703 +0200 > @@ -149,6 +149,7 @@ > VirtualDomain_Status() { > rc=$OCF_ERR_GENERIC > status="no state" > + bail_wait="yes"; > while [ "$status" = "no state" ]; do > status="`virsh $VIRSH_OPTIONS domstate $DOMAIN_NAME`" > case "$status" in > @@ -177,8 +178,13 @@ > # During the stop operation, we want to bail out > # quickly, so as to be able to force-stop (destroy) > # the domain if necessary. > - ocf_log error "Virtual domain $DOMAIN_NAME has no state > during > stop operation, bailing out." > - return $OCF_ERR_GENERIC; > + ocf_log info "Virtual domain $DOMAIN_NAME has no state > during > stop operation." > + if [ "$bail_wait" = "no" ]; then > + ocf_log error "Virtual domain $DOMAIN_NAME has no > state > during stop operation, bailing out." > + return $OCF_ERR_GENERIC; > + fi > + bail_wait="no" > + sleep 1 > else > # During all other actions, we just wait and try > # again, relying on the CRM/LRM to time us out if Can you please configure your mail agent to not insert line breaks when you send patches? Better still, use git send-email. At any rate, I consider the patch obsolete (and actually, it was already when it was submitted), as Lars Ellenberg implemented a "try this three times" logic in commit ffc83235, on July 1, 2010: https://github.com/ClusterLabs/resource-agents/commit/ffc8323515c19bc51fe0801fc3d2610878699ce3 > 2) The next problem is that a graceful shutdown sometimes does not work when > the machine just booted. This patch makes the RA send a shutdown command > every > 10 seconds while shutting down the machine. This catches the boot problem. > > @@ -234,6 +240,9 @@ > shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5)) > # Loop on status for $shutdown_timeout seconds > for i in `seq $shutdown_timeout`; do > + if [ $((i%10)) -eq 0 ]; then > + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME} > + fi > VirtualDomain_Status > status=$? > case $status in I see the point -- if you're issuing a KVM shutdown while the machine is still booting and the guest's acpid is not started, then the shutdown effectively doesn't happen. And issuing a shutdown request for a domain that's already got one should do no harm. Question is, why only do this every 10 seconds then? Might as well do it on every iteration. So we could just roll the invocation of "virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}" into the existing "while [ $NOW -lt $shutdown_timeout ]; do" loop. What do others think? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/