Hi, I'm still testing a setup with 2 nodes and a primary partition on each machine with the secondary on the opposite machine. I'm running drbd8 with Heartbeat 2.1.3 + CRM. It works fine if one of the machines fails completely (hold in the power til it shuts off or stopping Heartbeat). I'm running into a problem when I just pull the network cable on one machine and then plug it in again after a while. Obviously nothing has failed as far as heartbeat is concerned so the "failed" node takes back it's primary drbd partition and causes a split brain when I plug the cable back in. Would I need to add something like a pingd primitive and base the promotion of a drbd partition on the result from pingd? Or would STONITH be what I need to look at? I've not looked at STONITH at all yet.
If/when I use this in a production environment the machines are going to be administered remotely so the machines need to sort this sort of thing out from rules rather than intervention from me. I have one other stupid question, once I've brought a failed node back up and checked that it's ok, how do I switch the primary partition back to the recovered node? Make the changes through drbdadm or using crm_resource and those programs? Like I said, the machines would be administered remotely so I can't use a GUI. I need to be able to do it all from commandline. Thanks Guy -- Don't just do something...sit there! _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
