For fencing to work, it must actually cut off the other node. Simply
returning a fake success will cause a split-brain.
On 27/02/14 12:38 PM, TRIBOLET Thomas wrote:
Hi,
Thanks for your response.
I'll try booth plugin. It's in debian repository.
If it don't work I'll try making a fake fencing agent that will only restart
corosync. ( It seems it fix the problem )
Thanks
________________________________________
De : [email protected] [[email protected]]
de la part de Digimer [[email protected]]
Envoyé : jeudi 27 février 2014 17:05
À : General Linux-HA mailing list
Objet : Re: [Linux-HA] 2 Nodes split brain, distant sites
On 27/02/14 09:42 AM, TRIBOLET Thomas wrote:
2) My problem :
When there is a network problem :
Ex :
a) first-node site lost internet connection ( and communication with
second-node at same time due to vpn on internet connection )
b) cluster stop openvpn on first node and launch it on second due to primitive
p_ping in config.
c) connection come back on first-node site
d) Problem : first-node and second-node don't bring back cluster, the don't see
each other and create a cluster on each node -> split brain I think.
e) Each node has openvpn running which shouldn't happen
I don't have stonith running because I think without quorum it will be
problematic
Is there a way to say to corosync to recreate a ring ?
Or have someone another solution ?
Thanks
Bonjour,
This is the fundamental problem of "stretch" clusters (or
geo-clusters). There is no way to tell the difference between a site
failure and a network failure. In either case, the link is down, so
fencing can't be used. Without fencing, there is no way to avoid a
split-brain.
As for quorum; When quorum isn't used, fencing becomes *more*
important. Even then, quorum and fencing solve different problems.
Quorum is useful when nodes are acting in a defined manner. Fencing is
needed when a node is in an unknown state (and thus acting in an
undefined manner).
So regardless of quorum, fencing is required. It is the only way to
reliably avoid split-brains. Unfortunately, fencing doesn't work on
stretch clusters.
The pacemaker project is working on something called "booth" which is
designed to deal with this problem, but I don't know much about it, or
whether it's out of testing/dev yet.
So in short, if you must have a stretch cluster, I recommend manual
failover only.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems