For fencing to work, it must actually cut off the other node. Simply returning a fake success will cause a split-brain.

On 27/02/14 12:38 PM, TRIBOLET Thomas wrote:
Hi,

Thanks for your response.

I'll try booth plugin. It's in debian repository.

If it don't work I'll try making a fake fencing agent that will only restart 
corosync. ( It seems it fix the problem )


Thanks
________________________________________
De : [email protected] [[email protected]] 
de la part de Digimer [[email protected]]
Envoyé : jeudi 27 février 2014 17:05
À : General Linux-HA mailing list
Objet : Re: [Linux-HA] 2 Nodes split brain, distant sites

On 27/02/14 09:42 AM, TRIBOLET Thomas wrote:
2)      My problem :

When there is a network problem :

Ex :
a) first-node site lost internet connection ( and communication with 
second-node at same time due to vpn on internet connection )
b) cluster stop openvpn on first node and launch it on second due to primitive 
p_ping in config.
c) connection come back on first-node site
d) Problem : first-node and second-node don't bring back cluster, the don't see 
each other and create a cluster on each node -> split brain I think.
e) Each node has openvpn running which shouldn't happen


I don't have stonith running because I think without quorum it will be 
problematic
Is there a way to say to corosync to recreate a ring ?

Or have someone another solution ?

Thanks

Bonjour,

    This is the fundamental problem of "stretch" clusters (or
geo-clusters). There is no way to tell the difference between a site
failure and a network failure. In either case, the link is down, so
fencing can't be used. Without fencing, there is no way to avoid a
split-brain.

    As for quorum; When quorum isn't used, fencing becomes *more*
important. Even then, quorum and fencing solve different problems.
Quorum is useful when nodes are acting in a defined manner. Fencing is
needed when a node is in an unknown state (and thus acting in an
undefined manner).

    So regardless of quorum, fencing is required. It is the only way to
reliably avoid split-brains. Unfortunately, fencing doesn't work on
stretch clusters.

    The pacemaker project is working on something called "booth" which is
designed to deal with this problem, but I don't know much about it, or
whether it's out of testing/dev yet.

    So in short, if you must have a stretch cluster, I recommend manual
failover only.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to