I have a cluster with many nodes (12), all are connected to APC AP7900
rack PDU devices.

 

A manually executed stonith command resets the outlet as expected as
follows;

 

# stonith -t external/rackpdu -T reset -p "rack_pdu_ip
write_snmp_community outlet_number" nodename

 

The stonith command requires a nodename, but it does not matter what I
put there as the external plugin does not require it (seems odd).

 

Heartbeat 2.1.3 is configured symmetric cluster = false, stonith enabled
= true, resource stickiness = INFINITY, crm = yes.

 

When I disable "stonith enabled" I get clean failovers when a node dies,
but with stonith enabled I get a log entry on the DC that STONITH has
been scheduled, but then nothing happesn, no STONITH, no failover, just
oprahned resources.

 

I have created the stonith external/rackpu resource and created a
constraint that makes it run on only one node (the node that is home to
the DRBD peer). The resource show running on the node that the failover
would normally go to when stonith is disabled, and the resource is set
up to STONITH the node that the resource runs on normally.

 

What further debugging can I do to determine why the STONITH gets
scheduled but never executes?

 

There are no entries in the syslog about a STONITH script failure. The
script should execute snmpset and I have tested that the command as
formatted by the script does execute and produce the desire results
(when run as root); 

 

# snmpset -v 1 -c community _name pdu_hostname
.1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 i 3

 

Any suggestions?

 

Thank you

 

 

 

 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to