Hello, Having done some reading over the past week on Pacemaker and Corosync I'd like to ask a few questions about points on which I'm still hazy.
I'm trying to set up a very simple active/passive cluster of two nodes. The nodes will be KVM VM guests, each of the two running on separate physical hosts/hypervisors. The operating system in all four cases will be Redhat Enterprise Linux 6.2. I'll be using the RPMs that are part of the RHEL6.2 distribution - Pacemaker 1.1.6 with Corosync 1.4.1. This will - at the start - be a "poor man's" cluster - no SBD shared-disk fencing, no hardware controllers for STONITH, no dedicated network links. Basically the two physical machines will have just one, maybe two, network interfaces. I'm happy to accept that some answers might be along the lines of "you can't have a serious cluster/STONITH in such circumstances". That may be part of my problem; I'm trying to determine when STONITH might make a difference, and when it might be quite superfluous for my little cluster. Right now my cluster is operational with 'stonith-enabled="false"'. I've worked with clusters before, but STONITH is new to me, hence my indecision. Q1. Is STONITH of any use in my simple cluster? If each node has only one or two network connections to the other, and they will be using those very same connections to employ STONITH to bring down the other KVM node, is it worthwhile to set up STONITH? I gather that, if the networks fail and then each node believes the other is dead, they will both use STONITH to bring down the other. But the STONITH will likewise fail. But STONITH is also potentially of use in other (rarer?) cases; if something is wrong with the Corosync configuration of one node, say; or if a resource can't be STOPPED on one node for non-network reasons. In which case - if there's no actual _network_ problem - STONITH will function correctly. So should I set up STONITH anyway? Q2. If a STONITH agent fails to bring down a node, will the originating node continue to function as a cluster participant? In the example above, if the network connection(s) fail between the two cluster nodes I believe that each will try and bring down the other with STONITH. If each STONITH action _fails_ - because the network is down, and the STONITH agents will be configured to use the same network - will the nodes (both) try to run the cluster's resources? I understand that, if they do, the cluster will be running in a split-brain condition. But that's actually what I would prefer; I'd rather have both nodes running the application in split-brain mode rather than _neither_ running the application. (The application is largely 'read-only'; it's more important for readers to have access to it during any outage than protect it from split-brain; reconciling the split-brain data afterwards will be a simple case of picking one node's data to supersede the other's). If the answer to this is "a node will suicide if its STONITH agents fail to bring down other cluster nodes" then I guess it will follow that my answer to Q1 will be "don't set up STONITH, then". Q3. What STONITH agent do I use to control KVM cluster nodes? Looking at agents that can be used to shoot KVM VM guests I believe I have these options: 3.1 fence_virtd using its libvirt backend; except that only works for a single physical host/hypervisor, so I can't use it. 3.2 fence_virtd using its libvirt-qpid backend. However I haven't been able to find any examples of how this (a libvirt-qpid daemon is involved? Configuration of qpid/QMF on the hosts?) is set up. I saw that Adrian Allen posted a message to the Linux-HA mailing list a few days ago (the 1/9) asking about setting up qmf/qpid with fence_virt; like him I haven't found any examples of people using it. 3.3 I _have_ found people referencing an 'external/libvirt' agent. Does anyone know where I can find this agent? 3.4 I've also seen references to the 'fence_virsh' agent, which comes with the RHEL 6.2 fence RPMs. This agent seems to ssh from the originating node to the to-be-killed node and then run virsh to reboot the VM. If I do go ahead with STONITH I assume I'll be using one of the above agents, unless someone recommends another. Can anyone point me to guidelines on configuring qpid/QMF with fence_virtd (3.2)? Or tell me where I can find external/libvirt (3.3)? Many thanks for any help! _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
