I have a cluster with many nodes (12), all are connected to APC AP7900 rack PDU devices.
A manually executed stonith command resets the outlet as expected as follows; # stonith -t external/rackpdu -T reset -p "rack_pdu_ip write_snmp_community outlet_number" nodename The stonith command requires a nodename, but it does not matter what I put there as the external plugin does not require it (seems odd). Heartbeat 2.1.3 is configured symmetric cluster = false, stonith enabled = true, resource stickiness = INFINITY, crm = yes. When I disable "stonith enabled" I get clean failovers when a node dies, but with stonith enabled I get a log entry on the DC that STONITH has been scheduled, but then nothing happesn, no STONITH, no failover, just oprahned resources. I have created the stonith external/rackpu resource and created a constraint that makes it run on only one node (the node that is home to the DRBD peer). The resource show running on the node that the failover would normally go to when stonith is disabled, and the resource is set up to STONITH the node that the resource runs on normally. What further debugging can I do to determine why the STONITH gets scheduled but never executes? There are no entries in the syslog about a STONITH script failure. The script should execute snmpset and I have tested that the command as formatted by the script does execute and produce the desire results (when run as root); # snmpset -v 1 -c community _name pdu_hostname .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 i 3 Any suggestions? Thank you _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
