On Wed, Jan 16, 2013 at 2:01 PM, Digimer <[email protected]> wrote:

> Welcome to the addiction^h^h^hcommunity.

Hello Digimer!  Thanks for the speedy reply.

And the warning about the addiction.  :)

> In case it matters; Red Hat supports corosync + cman + rgmanager in RHEL
> 6.x. Pacemaker is scheduled to replace cman/rgmanager in RHEL 7, but
> until then, it's in tech-preview only and doesn't get updates between
> y-stream updates.

That's useful to know, thanks.  It's because of the Red Hat plan to
move to Pacemaker that I'm comfortable in investing the time in
learning it now, rather than set things up with cman/rgmanager.  We
won't be asking Red Hat to support this cluster.

> Also; Why 6.2 when 6.3 has been out for a long time?

It's just the DVD that I have to hand.  :)

But okay, I'll download 6.3, thanks for the nudge.  :)

> The docs are right; Fencing is really really important. I'd go so far as
> to say that your cluster is fatally flawed without proper fencing.

I have seen the various subtle RED HAT WILL NOT SUPPORT CLUSTERS
WITHOUT FENCING notes I've encountered along the way in the my
reading, yes.  :)

> I personally avoid this by using Active/Passive bonds, with each link in
> a different switch, plus two fence devices. I put IPMI on the first
> switch and PDU fence devices on the other switch. This way, at least one
> fence device is available, no matter what.

Ah, so clearly Pacemaker is clever enough to cycle through all of its
STONITH devices that apply to a node until one works.  That makes
sense, but I wasn't sure; most/all examples I've seen only mention a
single STONITH agent.  That's a useful example, thanks.

> I understand this is outside your current resources, but I would still
> implement fencing.

Yes.  I may be constrained to just a single network/switch, though,
which is why I'm still dubious.

> You would have to lose the switch. The lose of one link alone, say the
> network cable or interface used by corosync on one node dies, the other
> node will still successfully reach the failed node's fence device and
> kill it. This is why fence devices must exist outside of the target node.

RIght.  But with VMs the node VMs share the same network as the
hypervisor that does the killing, so they really share the same
network, don't they?

My concern about STONITH for my little cluster remains - with a single
network, if it fails then I'm going to have a case of both nodes
trying to kill the other ... and from what you told me earlier - "a
failed fence action will leave the cluster hung" - that means the
application will be left down on both.

What exactly do you mean by "cluster hung"?  Will the nodes suicide,
cycle with never-ending attempts to kill the other (before proceeding
to run resources) or just go into some 'suspended' state?

> "Largely" read-only is not entirely read-only. If it was truly
> read-only, then why use a cluster at all?

True.  That's why I described it as "largely read-only".

Our situation is ... in normal operation the cluster application will
be read/write.  In times of an outage it will be used mainly to
provide information for recovery purposes; i.e. mainly in a 'read'
capacity.  If any changes are made during the outage (by accident or
otherwise) then we're content to throw away those changes if
necessary.  It's much more important for the application to be running
during the outage itself.

> If your nodes are KVM VMs, you can use either 'fence_xvm' or 'fence_virsh'.
>
> ...
>
> Take a look at fence_xvm as well, it's multicast-based.

I gather, from reading the various man pages, that fence_xvm is a
'front end' to the fence_virtd daemon, using multicast to send fencing
requests to that daemon.  But I would have to configure fence_virtd to
use its 'libvirt-qpid' back-end to actually send/receive QMF/AMQP
requests via/to qpid to reboot the bad KVM node.  The configuration
necessary for that is currently a mystery to me.

Thank you for your help, I appreciate it!
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to