Hi there,

I have a SLES 10 SP2 based two node cluster. The cluster is stonith 
enabled
and uses IPMI to kill a dead node.


Finally I am testing the cluster and the behavior of the cluster if a node 
fails. 
I used iptables to block the udp packages of a node. After a short time 
the 
node get stonithed and the alive node take over the ressources of the dead 
node.

I tested the same thing with plugging off the power cables - with success.
In my last test I forgot plug in the power cable and the failover failed 
because
the alive node tries to reset / kill the dead node.

stonithd[6810]: 2008/12/08_09:28:08 info: external_run_cmd: Calling 
'/usr/lib64/stonith/plugins/external/ipmi off bdmz02' returned 256
stonithd[6810]: 2008/12/08_09:28:08 CRIT: external_reset_req: 'ipmi off' 
for host bdmz02 failed with rc 256
stonithd[7151]: 2008/12/08_09:28:08 info: Failed to STONITH node bdmz02 
with one local device, exitcode = 5. Will try to use the next local 
device.
stonithd[7151]: 2008/12/08_09:28:29 ERROR: Failed to STONITH the node 
bdmz02: optype=POWEROFF, op_result=TIMEOUT

After plugging in the cable (but not starting the server) the server 
recognizes that the stonith of the server is back to life
and the cluster will start the failover.

How can I manage or solve this problem because it can happen that one 
server room loose the power unit
and therefore the server has no power.

Greetings,
Adrian
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to