On 27/02/14 09:42 AM, TRIBOLET Thomas wrote:
2)      My problem :

When there is a network problem :

Ex :
a) first-node site lost internet connection ( and communication with 
second-node at same time due to vpn on internet connection )
b) cluster stop openvpn on first node and launch it on second due to primitive 
p_ping in config.
c) connection come back on first-node site
d) Problem : first-node and second-node don't bring back cluster, the don't see 
each other and create a cluster on each node -> split brain I think.
e) Each node has openvpn running which shouldn't happen


I don't have stonith running because I think without quorum it will be 
problematic
Is there a way to say to corosync to recreate a ring ?

Or have someone another solution ?

Thanks

Bonjour,

This is the fundamental problem of "stretch" clusters (or geo-clusters). There is no way to tell the difference between a site failure and a network failure. In either case, the link is down, so fencing can't be used. Without fencing, there is no way to avoid a split-brain.

As for quorum; When quorum isn't used, fencing becomes *more* important. Even then, quorum and fencing solve different problems. Quorum is useful when nodes are acting in a defined manner. Fencing is needed when a node is in an unknown state (and thus acting in an undefined manner).

So regardless of quorum, fencing is required. It is the only way to reliably avoid split-brains. Unfortunately, fencing doesn't work on stretch clusters.

The pacemaker project is working on something called "booth" which is designed to deal with this problem, but I don't know much about it, or whether it's out of testing/dev yet.

So in short, if you must have a stretch cluster, I recommend manual failover only.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to