Hi Ian,
I think there is a flaw in the design. For example, say the network card fails
on machine A. Machine B detects this and tries to fence machine A. The problem
with doing it via ssh to modify iptables is that there is no network
connectivity to Machine A and hence this mechanism will never work. What you
need is a solution that works independently of the OS such as a power switch or
remote management interface such as IBM RSA II, HP iLO etc. With fencing, the
solution has to be absolute and ruthless in that, in this example, machine B
needs to be able to fence Machine A absolutely every time there is a problem
and as soon as there is a problem.
Regards
John
----- Original Message -----
From: Ian Hayes
To: [email protected]
Sent: Friday, April 10, 2009 1:07 AM
Subject: [Linux-cluster] Fenced failing continuously
I've been testing a newly built 2-node cluster. The cluster resources are a
virtual IP and squid, so in a node failure, the VIP would go to the surviving
node and start up Squid. I'm running a modified fencing agent that will SSH
into the failing node and firewall it off via IPtables (not my choice).
This all works fine for graceful shutdowns, but when I do something nasty
like pulling the power cord on the node that is currently running the service,
the surviving node never assumes the service and spends all its time trying to
fire off the fence agent, which obviously will not work because the server is
completely offline. The only way I can get the surviving node to assume the VIP
and start Squid is to fence_ack_manual, which sort of runs counter to running a
cluster to begin with. The logs are filled with
Apr 12 00:01:44 <hostname> fenced[3223]: fencing node "<otherhost>"
Could not disable xx.xx.xx.xx on 23]: agent "fence_iptables" reports:
ssh: connect to host xx.xx.xx.xx port 22: No route to host
Is this a misconfiguration, or is there an option I can include somewhere to
tell the nodes to give it up after a certain number of tries?
------------------------------------------------------------------------------
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster