On Fri, Mar 8, 2013 at 1:06 PM, Alex Sudakar <[email protected]> wrote: > Hi. I have a simple two-node cluster which serves up a web > application in an active/passive configuration. The cluster is > running Pacemaker 1.1.7-6 and Corosync 1.4.1-7 with Red Hat Enterprise > Linux 6.3. Each node is a KVM (libvirt) virtual machine hosted on a > Red Hat Enterprise 6.3 physical host (the same host at the moment, for > testing purposes. Ultimately two separate physical hypervisor > machines at two different locations). > > I have a simple question to ask about STONITH functionality and how it > should be designed for my cluster. > > I have configured stonith agents for the two nodes using the > stonith:fence_virsh fencing agent, which works fine. If there's a > problem each node will use the appropriate resource agent to slay the > other node/VM. > > But I'm wondering what will happen if the physical machine hosting the > VM is down. > > My understanding is that a Pacemaker cluster will 'hang' if a stonith > agent fails.
I prefer the term "block". We do this because we can't tell the difference between the host being down (in which case the guest is too) and the being unreachable (in which case the guest could be up and doing BadThings(tm)). In the later case, starting services that the other side had is highly likely to create a world of pain. > I've seen that mentioned in this mailing list and I've > observed this to be the case in tests of my cluster. Put in the wrong > login password for the fence_virsh agent to use and the stonith action > never succeeds and pacemaker never proceeds to redeploy its other > resources. The resources continue running as they were while the > stonith procedure fails on its continuing attempts. > > What happens if the physical host is down? If the fence_virsh agent > can't ssh into the host in the first instance, will it assume that (a) > the host is down, and therefore (b) the guest node/VM must also be > down, and thus return 'success' as to its fencing operation? Or will > it (continually) return failure and so result in the cluster > 'hanging'? It might be configurable (or be possible to make it so, the agent is written in python iirc). But be _very_ sure you're comfortable with the risks of assuming everything is good and returning success. > > I can't test to see what happens because I only have the one physical > host at present for my tests. :) > > Can someone advise me on what fence_virsh reports if the physical host > is down? What happens to the cluster? > > Thanks! > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
