On 2010-03-16 11:38, Michael Schwartzkopff wrote: > [...] >>>> I can see in traces that as soon as the start vm is launched >>>> successfully : with virsh start ( rc=0 ) in VirtualDomain, there is a >>>> status request from VirtualDomain which returns : status=running >>>> and juste after several monitoring requests with the monitor_scripts >>>> given at configuration so /usr/sbin/hakvm (a script of mine which does >>>> ssh towards the vm). >>>> >>>> The problem is that the vm is "starting" but it takes a while before to >>>> get really "started" and reachable (about 30s) and the monitoring >>>> request from Pacemaker should not be launched before that. >>>> >>>> I tried to play on op monitor values but it seems to not be relative to >>>> my problem ... >>>> >>>> Where do I miss something ? >>> Hi, >>> >>> Add a start-delay to your monitoring operation. You monitoring will only >>> start after that delay when the virtual server is able to answer your >>> requests. >> No, fix the agent instead :-) > > Yes. That would be the final solution. Me suggestion is a quick and Dirty > workaround.
One where you suggest to use an option that everyone else agrees is
deprecated.
Now, Alain, let's take a look here:
VirtualDomain_Start() {
if VirtualDomain_Status; then
ocf_log info "Virtual domain $DOMAIN_NAME already running."
return $OCF_SUCCESS
fi
virsh $VIRSH_OPTIONS start ${DOMAIN_NAME}
rc=$?
if [ $rc -ne 0 ]; then
ocf_log error "Failed to start virtual domain ${DOMAIN_NAME}."
return $OCF_ERR_GENERIC
fi
while ! VirtualDomain_Monitor; do
sleep 1
done
return $OCF_SUCCESS
}
and
VirtualDomain_Monitor() {
# First, check the domain status. If that returns anything other
# than $OCF_SUCCESS, something is definitely wrong.
VirtualDomain_Status
rc=$?
if [ ${rc} -eq ${OCF_SUCCESS} ]; then
# OK, the generic status check turned out fine. Now, if we
# have monitor scripts defined, run them one after another.
for script in ${OCF_RESKEY_monitor_scripts}; do
script_output="$($script 2>&1)"
script_rc=$?
if [ ${script_rc} -ne ${OCF_SUCCESS} ]; then
# A monitor script returned a non-success exit
# code. Stop iterating over the list of scripts, log a
# warning message, and propagate $OCF_ERR_GENERIC.
ocf_log warn "Monitor command \"${script}\" for domain
${DOMAIN_NAME}
returned ${script_rc} with output: ${script_output}"
rc=$OCF_ERR_GENERIC
break
else
ocf_log debug "Monitor command \"${script}\" for domain
${DOMAIN_NAME}
completed successfully with output: ${script_output}"
fi
done
fi
return ${rc}
}
IOW: start spins on monitor, and if you have a monitor script defined it
must not return 0 unless the virtual domain is effectively up.
However, status (but not monitor) returns 0 if the domain has been
started as per virsh's perspective, disregarding the monitor script
(that's why it's called a _monitor_ script, not a status script).
Andrew: how is the RA broken here? Since probe uses monitor, the monitor
script does apply to a probe. So that can't be the problem I suppose.
But here Alain is talking about a status operation -- where does that
come from, and how is it relevant?
I'd be happy to fix this if and where it's broken, I just fail to see
the breakage here. All insights appreciated.
Cheers,
Florian
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
