Re: [Linux-HA] Unable to restart cluster using resource_failure_stickiness

Andrew Beekhof Thu, 20 Sep 2012 20:28:06 -0700

On Tue, Sep 18, 2012 at 9:49 PM, Olivier BONHOMME <[email protected]> wrote:
> Hello the list,
>
> New user to the HA world, I am trying to create an active/passive
> cluster unfortunately with an heartbeat 2.1.4 version for historical
> reasons.


Stop. Please do not use those old releases.
They're terrible, the number of bugs fixed since then must number in
the hundreds.

Failover due to migration is one area in particular that got fixed (by
ditching the truly awful idea of failure stickiness).

>
> My resource stack is composed of a resource group with collected and
> ordered flags set to true. This group is composed of the following
> components :
>
> - DRBD Disk
> - VIP
> - slapd
> - postgresql
> - application
>
> I am trying to execute the following scenario : On a service failure, I
> try to restart and after 5 failures, I move to the other node.
>
> For doing that, I configured a ressource_failure_stickiness_value to 100
> on the group and created the following location rule :
>
> <constraints>
>    <rsc_location id="location_failure" rsc="group_1">
>          <rule id="prefered_location_failure" score="-500"/>
>    </rsc_location>
> </constraints>
>
> After that, I start my cluster and simulates 5 failures on my node 0.
> After these 5 failures, the cluster switches to the node 1 so it seems
> to be the waited behaviour.
>
> Then, I simulate 5 others failures on the node 1 and after these
> failures, the node 1 stops and there is no switchover on node0. It seems
> normal to be since there is alaways a fail-count to 5 for the node 0 and
> also the node 1.
>
> Now, I want to restart the cluster on the node 0, so I cleared the
> fail-count for the resource but the cluster doesn't want to restart
> anymore. Same thing if i do it on the node 1.
>
> A crm_verify -LV shows me that the ressources cannot run anywhere and if
> I use ptest in really verbose mode, I see the following message :
>
> ptest[10784]: 2012/09/18_15:39:37 debug: native_assign_node: All nodes
> for resource drbddisk_1 are unavailable, unclean or shutting down
> (node1: 1, -1000000).
>
> With a more verbose level, I see that both nodes seems to have a
> '-INFINITY' score :
> ptest[11819]: 2012/09/18_15:40:53 debug: debug2: node_list_update:
> node1: -1000000 + 0
> ptest[11819]: 2012/09/18_15:40:53 debug: debug2: node_list_update:
> node0: -1000000 + 0
>
> Restarting hearbeat didn't put back my cluster in nominal state. That's
> why, I have several questions for this ML :
>
> - Is there a known way to put back my cluster in a nominal state ?
> - Did I correctly understand the ressource_failure_stickiness concept ?
> - Can it be a bug on the 2.1.4 version which can force me to use HB3 +
> pacemaker ?
>
> Thanks in advance for your answers.
>
> Regards,
> Olivier BONHOMME
>
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Unable to restart cluster using resource_failure_stickiness

Reply via email to