[Linux-HA] Request some basic advice about STONITH

Alex Sudakar Tue, 15 Jan 2013 19:28:22 -0800

Hello,

Having done some reading over the past week on Pacemaker and Corosync
I'd like to ask a few questions about points on which I'm still hazy.


I'm trying to set up a very simple active/passive cluster of two
nodes.  The nodes will be KVM VM guests, each of the two running on
separate physical hosts/hypervisors.  The operating system in all four
cases will be Redhat Enterprise Linux 6.2.  I'll be using the RPMs
that are part of the RHEL6.2 distribution - Pacemaker 1.1.6 with
Corosync 1.4.1.

This will - at the start - be a "poor man's" cluster - no SBD
shared-disk fencing, no hardware controllers for STONITH, no dedicated
network links.  Basically the two physical machines will have just
one, maybe two, network interfaces.  I'm happy to accept that some
answers might be along the lines of "you can't have a serious
cluster/STONITH in such circumstances".  That may be part of my
problem; I'm trying to determine when STONITH might make a difference,
and when it might be quite superfluous for my little cluster.

Right now my cluster is operational with 'stonith-enabled="false"'.
I've worked with clusters before, but STONITH is new to me, hence my
indecision.


Q1.  Is STONITH of any use in my simple cluster?

If each node has only one or two network connections to the other, and
they will be using those very same connections to employ STONITH to
bring down the other KVM node, is it worthwhile to set up STONITH?  I
gather that, if the networks fail and then each node believes the
other is dead, they will both use STONITH to bring down the other.
But the STONITH will likewise fail.

But STONITH is also potentially of use in other (rarer?) cases; if
something is wrong with the Corosync configuration of one node, say;
or if a resource can't be STOPPED on one node for non-network reasons.
 In which case - if there's no actual _network_ problem - STONITH will
function correctly.

So should I set up STONITH anyway?


Q2.  If a STONITH agent fails to bring down a node, will the
originating node continue to function as a cluster participant?

In the example above, if the network connection(s) fail between the
two cluster nodes I believe that each will try and bring down the
other with STONITH.  If each STONITH action _fails_ - because the
network is down, and the STONITH agents will be configured to use the
same network - will the nodes (both) try to run the cluster's
resources?

I understand that, if they do, the cluster will be running in a
split-brain condition.  But that's actually what I would prefer; I'd
rather have both nodes running the application in split-brain mode
rather than _neither_ running the application.  (The application is
largely 'read-only'; it's more important for readers to have access to
it during any outage than protect it from split-brain; reconciling the
split-brain data afterwards will be a simple case of picking one
node's data to supersede the other's).

If the answer to this is "a node will suicide if its STONITH agents
fail to bring down other cluster nodes" then I guess it will follow
that my answer to Q1 will be "don't set up STONITH, then".


Q3.  What STONITH agent do I use to control KVM cluster nodes?

Looking at agents that can be used to shoot KVM VM guests I believe I
have these options:

3.1  fence_virtd using its libvirt backend; except that only works for
a single physical host/hypervisor, so I can't use it.

3.2  fence_virtd using its libvirt-qpid backend.  However I haven't
been able to find any examples of how this (a libvirt-qpid daemon is
involved?  Configuration of qpid/QMF on the hosts?) is set up.  I saw
that Adrian Allen posted a message to the Linux-HA mailing list a few
days ago (the 1/9) asking about setting up qmf/qpid with fence_virt;
like him I haven't found any examples of people using it.

3.3  I _have_ found people referencing an 'external/libvirt' agent.
Does anyone know where I can find this agent?

3.4  I've also seen references to the 'fence_virsh' agent, which comes
with the RHEL 6.2 fence RPMs.  This agent seems to ssh from the
originating node to the to-be-killed node and then run virsh to reboot
the VM.

If I do go ahead with STONITH I assume I'll be using one of the above
agents, unless someone recommends another.  Can anyone point me to
guidelines on configuring qpid/QMF with fence_virtd (3.2)?  Or tell me
where I can find external/libvirt (3.3)?

Many thanks for any help!
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Request some basic advice about STONITH

Reply via email to