I have a cluster setup with 2 dell servers, dual ethernet heartbeats,
and a single 8-port APCMaster PDU switch.  The cluster works except
for one issue. The cloned stonithd interface refuses to make a call to
the apcmaster to power down the node that's "dead". Reading through
the logs I can see that during setup the stonithd asks the
apcmastersnmp module to check it's hosts list and it returns the
correct hostnames  "tpc-dal-prlores3 tpc-dal-tcfs2". However when the
time comes for it to actually use the device I get the following
message from stonithd refusing to actually kill the other node.

Mar  3 13:00:36 TPC-DAL-TCFS2 crmd: [15805]: info: te_fence_node:
Executing poweroff fencing operation (24) on TPC-DAL-PRLORES3
(timeout=60000)
Mar  3 13:00:36 TPC-DAL-TCFS2 crmd: [15805]: debug: waiting for the
stonith reply msg.
Mar  3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: info: client tengine
[pid: 15805] requests a STONITH operation POWEROFF on node
TPC-DAL-PRLORES3
Mar  3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: info: we can't manage
TPC-DAL-PRLORES3, broadcast request to other nodes
Mar  3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: debug: inserted
optype=POWEROFF, key=-2
Mar  3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: info: Broadcasting
the message succeeded: require others to stonith node
TPC-DAL-PRLORES3.
Mar  3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: debug:
stonithd_node_fence: sent back a synchronous reply.
Mar  3 13:00:36 TPC-DAL-TCFS2 crmd: [15805]: debug:
stonithd_node_fence:574: stonithd's synchronous answer is ST_APIOK


The stonith is configured as follows:

    <clone id="fencing" >
        <primitive class="stonith" id="apcstonith23" type="apcmastersnmp" >
        <operations id="apcstonith23-operations" >
          <op id="apcstonith23-op-monitor-15" interval="15"
name="monitor" start-delay="15" timeout="15" />
         </operations>
 <instance_attributes id="apcstonith23-instance_attributes" >
 <nvpair id="nvpair-604e339f-a400-4b30-82c0-f046de0ed663"
name="ipaddr" value="172.20.1.23" />
<nvpair id="nvpair-ed611421-97a1-4091-a5cd-8159f1230096" name="port"
value="161" />
 <nvpair id="nvpair-997431e2-ea78-4065-b835-f9149bbcb596"
name="community" value="private" />
 </instance_attributes>
</primitive>
 <meta_attributes id="fencing-meta_attributes" >
  </meta_attributes>
</clone>


I can confirm the use of the stonith via the command "stonith -t
apcmastersnmp <params> tpc-dal-prlores3" and it'll switch off the
server.

Any help would be appreciated.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to