Guy wrote:
Hi,

I'm still testing a setup with 2 nodes and a primary partition on each
machine with the secondary on the opposite machine. I'm running drbd8
with Heartbeat 2.1.3 + CRM. It works fine if one of the machines fails
completely (hold in the power til it shuts off or stopping Heartbeat).
I'm running into a problem when I just pull the network cable on one
machine and then plug it in again after a while. Obviously nothing has
failed as far as heartbeat is concerned so the "failed" node takes
back it's primary drbd partition and causes a split brain when I plug
the cable back in. Would I need to add something like a pingd
primitive and base the promotion of a drbd partition on the result
from pingd?

Do you have a secondary communication line for heartbeat? Like a searial cable or crossover network cable. If so then you need to look into dopd. By using dopd one node can tell the other that it has been outdated using heartbeats secondary communication channel.

Then, yes, you need some sort of pingd based rule so that the node that has lost its network connections knows it is in a weaker state than the other one.

With both these things together, you should be able to have a very reliable cluster without the need for a stonith (but you will still need to take care that should the problem fix itself, the node doesn't come back online and start messing things up).

Paul
(aka Gargoyle)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to