Re: [Linux-HA] restart x times before a failover

Max Hofer Tue, 18 Sep 2007 03:01:12 -0700

On Tuesday 18 September 2007, Spindler Michael wrote:
> Hi *,
>
> I´ve got a (hopefully) simple question:
>
> I have 5 node cluster, running 20 resources (single proceses). I would like
> to have the following behavior: If a resource fails, it should try to
> restart it on the same node. But this should be done max 2 times, then the
> rsesource should failover to another node. The resource should not do a
> auto failback, after a failed host is up again.
>
> I have tried the following:
> - default_resource_failure_stickiness set to -1
> - resource_stickiness set to 3 (on each resource)
> - no places or other constraints configured.
>
> According to http://linux-ha.org/v2/faq/forced_failover we should get:
>
> (stickiness) / abs(failure stickiness) = maximum times, a resource can fail
> before moved to another node.
>
> So in my case: 3 / abs(-1) = 3
>
> But my resources do a failover to other nodes immediatly after the first
> failure.
>
>
> Anyone here who is able to help me with this failover-scenario?
First of all always provide the file created by the pengine which lead to the 
failover - so we can give you an answer ;-)  (see below for explanations).


The best way to takle such kind of errors is following method:

* trigger a resource failure
* check the ha-log and see which CIB-status file was written on the failover 
(grep "PEngine Input stored" /var/log/halog) ---> they are usually stored 
in /var/lib/heartbeat/pengine
* use ptest and the written file to check what is wrong

!!! This assumes you did not disable writing of such log files (btw you can 
set the number of such log files with the cluster property):

<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-pe-error-series-max" 
name="pe-error-series-max" value="-1"/>
<nvpair id="cib-bootstrap-options-pe-warn-series-max" 
name="pe-warn-series-max" value="400"/>
<nvpair id="cib-bootstrap-options-pe-input-series-max" 
name="pe-input-series-max" value="200"/>
</attributes>
</cluster_property_set>
</crm_config>

Assuming the info was stored in /var/lib/heartbeat/pengine/pe-input-239.bz2:
cd /var/lib/heartbeat/pengine

* to see the scores for nodes/resources
ptest -X pe-input-239.bz2 -VVVVVVVV 2>&1|grep resource_node_score|less

* to see which constraints/rules lead to the score:
ptest -X pe-input-239.bz2 -VVVVVVVV 2>&1|grep test_expression|less

ptest -X pe-input-239.bz2 -VVVVVVVV 2>&1|grep native_rsc_location|less

I think you get the point ...



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] restart x times before a failover

Reply via email to