Hi there, I have a SLES 10 SP2 based two node cluster. The cluster is stonith enabled and uses IPMI to kill a dead node.
Finally I am testing the cluster and the behavior of the cluster if a node fails. I used iptables to block the udp packages of a node. After a short time the node get stonithed and the alive node take over the ressources of the dead node. I tested the same thing with plugging off the power cables - with success. In my last test I forgot plug in the power cable and the failover failed because the alive node tries to reset / kill the dead node. stonithd[6810]: 2008/12/08_09:28:08 info: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/ipmi off bdmz02' returned 256 stonithd[6810]: 2008/12/08_09:28:08 CRIT: external_reset_req: 'ipmi off' for host bdmz02 failed with rc 256 stonithd[7151]: 2008/12/08_09:28:08 info: Failed to STONITH node bdmz02 with one local device, exitcode = 5. Will try to use the next local device. stonithd[7151]: 2008/12/08_09:28:29 ERROR: Failed to STONITH the node bdmz02: optype=POWEROFF, op_result=TIMEOUT After plugging in the cable (but not starting the server) the server recognizes that the stonith of the server is back to life and the cluster will start the failover. How can I manage or solve this problem because it can happen that one server room loose the power unit and therefore the server has no power. Greetings, Adrian _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
