On 2010-03-16 11:38, Michael Schwartzkopff wrote:
> [...]
>>>> I can see in traces that as soon as the start vm is launched
>>>> successfully : with virsh start ( rc=0 ) in VirtualDomain, there is a
>>>> status request from VirtualDomain which returns : status=running
>>>> and juste after several monitoring requests with the monitor_scripts
>>>> given at configuration so /usr/sbin/hakvm (a script of mine which does
>>>> ssh towards the vm).
>>>>
>>>> The problem is that the vm is "starting" but it takes a while before to
>>>> get really "started" and reachable (about 30s) and the monitoring
>>>> request from Pacemaker should not be launched before that.
>>>>
>>>> I tried to play on op monitor values but it seems to not be relative to
>>>> my problem ...
>>>>
>>>> Where do I miss something ?
>>> Hi,
>>>
>>> Add a start-delay to your monitoring operation. You monitoring will only
>>> start after that delay when the virtual server is able to answer your
>>> requests.
>> No, fix the agent instead :-)
> 
> Yes. That would be the final solution. Me suggestion is a quick and Dirty 
> workaround.

One where you suggest to use an option that everyone else agrees is
deprecated.

Now, Alain, let's take a look here:

VirtualDomain_Start() {
    if VirtualDomain_Status; then
        ocf_log info "Virtual domain $DOMAIN_NAME already running."
        return $OCF_SUCCESS
    fi

    virsh $VIRSH_OPTIONS start ${DOMAIN_NAME}
    rc=$?
    if [ $rc -ne 0 ]; then
        ocf_log error "Failed to start virtual domain ${DOMAIN_NAME}."
        return $OCF_ERR_GENERIC
    fi

    while ! VirtualDomain_Monitor; do
        sleep 1
    done
    return $OCF_SUCCESS
}

and

VirtualDomain_Monitor() {
    # First, check the domain status. If that returns anything other
    # than $OCF_SUCCESS, something is definitely wrong.
    VirtualDomain_Status
    rc=$?
    if [ ${rc} -eq ${OCF_SUCCESS} ]; then
        # OK, the generic status check turned out fine.  Now, if we
        # have monitor scripts defined, run them one after another.
        for script in ${OCF_RESKEY_monitor_scripts}; do
            script_output="$($script 2>&1)"
            script_rc=$?
            if [ ${script_rc} -ne ${OCF_SUCCESS} ]; then
                # A monitor script returned a non-success exit
                # code. Stop iterating over the list of scripts, log a
                # warning message, and propagate $OCF_ERR_GENERIC.
                ocf_log warn "Monitor command \"${script}\" for domain 
${DOMAIN_NAME}
returned ${script_rc} with output: ${script_output}"
                rc=$OCF_ERR_GENERIC
                break
            else
                ocf_log debug "Monitor command \"${script}\" for domain 
${DOMAIN_NAME}
completed successfully with output: ${script_output}"
            fi
        done
    fi
    return ${rc}
}

IOW: start spins on monitor, and if you have a monitor script defined it
must not return 0 unless the virtual domain is effectively up.

However, status (but not monitor) returns 0 if the domain has been
started as per virsh's perspective, disregarding the monitor script
(that's why it's called a _monitor_ script, not a status script).

Andrew: how is the RA broken here? Since probe uses monitor, the monitor
script does apply to a probe. So that can't be the problem I suppose.
But here Alain is talking about a status operation -- where does that
come from, and how is it relevant?

I'd be happy to fix this if and where it's broken, I just fail to see
the breakage here. All insights appreciated.

Cheers,
Florian

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to