Hi!

I don't have a real answer for this, but I can report other bad experience with 
2-node cluster like yours:

If the DC is fenced, the other node tries to become DC, but if the other node 
(who still thinks he's DC) reboots just before the other node has completed his 
"ego trip", both nodes cannot agree on who's becoming DC. I'll have to reboot 
(or shut down OpenAIS) on one of these nodes. Seen in SLES11 SP2 (latest 
updates).

An idea for you problem: If the cluster would count "reboots within a 
timeframe" (e.g. as node attribute), the fencong operation could change from 
reboot to poweroff. I don't know how to do it, though.

Regards,
Ulrich

>>> Alex Sudakar <[email protected]> schrieb am 03.09.2013 um 05:23 in
Nachricht
<calq2s-hxkq5ghv9bs1snnojk4gtnl1su-nzujpdxwosv2ap...@mail.gmail.com>:
> I've got a very simple question which I suspect betrays my lack of
> understanding of something basic.  Could someone help me understand?
> 
> If I have a two-node Pacemaker cluster - say, a really simple cluster
> of two nodes, A & B, with a solitary network connection between them -
> then I have to set no-quorum-policy to 'ignore'.  If the network
> connection is broken then both A & B will attempt to STONITH each
> other.
> 
> Is there anything that would stop an endless cycle of each killing the
> other if the actions of the STONITH agents are set to reboot?
> 
> I.e.:
> 
> -  A & B race to STONITH each other
> -  A kills B
> -  A assumes resources
> 
> -  B reboots
> -  B can't see A
> -  B kills A
> -  B assumes resources
> 
> -  A reboots
> -  A can't see B
> -  A kills B
> -  A assumes resources
> 
> ... etc.
> 
> It's to stop this sort of cycle that I've set my STONITH actions to
> 'off' rather than 'reboot'.
> 
> But I was reading the 'Fencing topology' document that Digimer
> referenced and I was reminded in my perusal that many people/clusters
> use a 'reboot' action.
> 
> For a simple quorum-less cluster of two nodes how do those clusters
> avoid a never-ending cycle of each node killing the other, if neither
> node can 'see' the other via corosync?
> 
> It's a very basic question; I think I'm forgetting something obvious.
> Thanks for any help!
> _______________________________________________
> Linux-HA mailing list
> [email protected] 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to