Re: [Linux-HA] Antw: A couple of questions regarding STONITH & fencing ...

Andrew Beekhof Mon, 02 Sep 2013 21:45:30 -0700

On 29/08/2013, at 4:18 PM, Ulrich Windl <[email protected]> 
wrote:


> Hi!
> 
> After some short thinking I find that using ssh as STONITH is probably the 
> wrong thing to do, because it can never STONITH if the target is down already.

Correct

> (Is it documented that a STONITH operation should work independent from the 
> target being up or down?)

Stonith must be reliable. Anything that doesn't work when the target is down is 
not reliable.

> Maybe some shared storage and a mechanism like sbd is the way to go. With 
> everything VMs, shared storage shouldn't be a problem.
> 
> So if your VMs fail because the host failed, the cluster will see the loss of 
> comunication, will try to fence the affected VMs (to be sure they are down), 
> and then start to rerun failed resources. Right?

Assuming the VMs are cluster nodes. Yes.  In which case you could use fence_xvm 
or whatever the suse equivalent is.

> 
> Regards,
> Ulrich
> 
> 
>>>> Alex Sudakar <[email protected]> schrieb am 29.08.2013 um 04:11 in
> Nachricht
> <calq2s-f-m_b92quh0kbb4ozufstfk2gavyq19+xvsb7of8k...@mail.gmail.com>:
>> Hi.  I'm running a simple two-node cluster with Red Hat Enterprise
>> Linux 6, corosync 1.4.1, pacemaker 1.1.7 and crm.  Each node (A & B)
>> are KVM VMs in separate physical machines/hypervisors.
>> 
>> I have two separate networks between the machines; network X is a
>> bonded pair, network Y a direct link between the two hypervisors.
>> I've set up corosync to use X for ring #0 and Y for ring #1.  The
>> clustered application is a tomcat web application running on top of a
>> DRBD block device.  I've configured DRBD to use X for its
>> communication link (DRBD apparently can't handle more than one
>> network/link).  I've set up the DRBD 'fence-peer' and
>> 'after-resync-target' hooks to call the scripts to manage location
>> constraints when network X is down.
>> 
>> I also appreciate the documentation out there that teaches one all of this.  
>> :-)
>> 
>> Nodes A & B each have two STONITH resources set up for the other.
>> Node A, for example, has resources 'fence_B_X' and 'fence_B_Y' which
>> are instances of the stonith:fence_virsh resource agent configured to
>> use either network X or network Y to ssh to to B's physical host and
>> use 'virsh' to power off B.
>> 
>> Everything's currently working as expected, but I have a couple of
>> questions about how STONITH works and how I might be able to tweak it.
>> 
>> 1.  How can I add STONITH for the physical machines into the mix?
>> 
>> In one of my tests I powered off B's hypervisor while B was running
>> the clustered application.  A tried to stonith B but both of its fence
>> resources failed, since neither could ssh to that hypervisor to run
>> virsh -
>> 
>>    stonith-ng[1785]:   notice: can_fence_host_with_device: Disabling
>> port list queries for fence_B_X (0/-1): Unable to connect/login to
>> fencing device
>>    stonith-ng[1785]:   info: can_fence_host_with_device: fence_B_X
>> can not fence B (aka. 'B'): dynamic-list
>> 
>> That's as expected, of course.  As was the cluster's 'freeze' on
>> starting the application on A; I've been told on this list that
>> pacemaker won't move resources if/while its STONITH operations have
>> failed.
>> 
>> How could I add, say, a 'fence_ilo' stonith resource agent to the mix?
>> The idea being that, if a node can't successfully use fence_virsh to
>> power down the other VM, it can then escalate to trying to power off
>> that node's entire hypervisor?
>> 
>> Q 1.1.  Could I set up a stonith fence_ilo resource with suitable
>> pcmk* parameters to 'map' the cluster node names A & B to resolve to
>> the hypervisors' names that the fence_ilo resource would understand
>> and use to take down the hypervisor?
>> 
>> Q 1.2.  How can I prioritize fence resources?  I haven't come across
>> that sort of thing; please forgive me if its a basic configuration
>> facility (of crm) that I've forgotten (it's not the same as priority
>> weights in allocating a resource to a node, but rather the order in
>> which each stonith agent is invoked in a stonith situation on that
>> node).  Can I set up the cluster so fence_A_X and fence_A_Y will be
>> tried first before the cluster will attempt 'fence_A_hypervisor'?
>> 
>> Q2.  How can I set up STONITH to proceed even if STONITH wasn't achieved?
>> 
>> I understand the golden rule of a pacemaker cluster is not to proceed
>> - to 'freeze' - if STONITH fails.  And why that rule is in place, to
>> avoid the dangers of split-brain.  Really.  :-)
>> 
>> But in this case - the improbable event that both networks are down -
>> my preference is for each node (if up) to independently run the
>> application.  I.e. for each to assume 'primary' role in its side of
>> the DRBD resource and run the tomcat application.  I'm okay with that,
>> I want that, I'll accept the burden of reconciling antagonistic data
>> updates after the crisis is over.  Really.  :-)
>> 
>> Is there an accepted/standard way to tell a STONITH agent - or
>> fence_virsh - to return 'success' even if the STONITH operation
>> failed?  Or a configuration item that instructs Pacemaker to proceed
>> 'after best effort'?
>> 
>> Or do I have to 'hack' my copy of fence_virsh, look for where it
>> returns 'Unable to connect/login to fencing device' and have it
>> 'pretend' it didn't find it (i.e. the agent lies and says that the VM
>> is down)?
>> 
>> Will anyone on this list speak to me if I admit to doing something
>> like that?  :-)
>> 
>> Many thanks for any help!
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected] 
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
>> See also: http://linux-ha.org/ReportingProblems 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: A couple of questions regarding STONITH & fencing ...

Reply via email to