In my GFS cluster, I use DRAC cards as the fencing device for each node.
Yesterday, I had a situation where the DRAC card on a particular node had
failed, and would not allow remote logins, etc, but it still returned
pings.  I don't know how long the card had been dead, and I only noticed
because I wished to manually fence the node and fencing failed ... which
caused me all sorts of other fun to recover the cluster, afterwards.  So, I
have uncovered a pretty scary bad-case scenario for my cluster
configuration.

My question is what (if anything) can RHCS/GFS do to determine the
health/presence/operation of fencing devices?  If it can do something to
monitor the fencing devices, and discovers a bad fencing device, what will
it do?  For example, if I unplug the network cable for the heartbeat, the
node will get fenced immediately.  I never tested whether the same would
happen if I unplugged a fencing device.  I haven't delved into the
documentation in a while, but I don't remember anything about a way to have
redundant fencing devices, like a DRAC and a network power switch.  Is there
a way?

Thoughts, opinions, insight, documentation, etc would be greatly
appreciated.

--
Brandon
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to