I replaced the strncmp() calls in the stonithd.c function for matching the node_name to the device hosts controlled list with the case-insensitive version strncasecmp() and it's working like a champ now.
Are the node names case sensitive or insensitive? If they are insensitive then it might be a good idea to do all node name comparisons with the strncasecmp() call instead just to thwart any future cse issues. :) On Thu, Mar 4, 2010 at 4:02 AM, Andreas Kurz <[email protected]> wrote: > On Wednesday 03 March 2010 20:40:18 Brian Wolfe wrote: >> I have a cluster setup with 2 dell servers, dual ethernet heartbeats, >> and a single 8-port APCMaster PDU switch. The cluster works except >> for one issue. The cloned stonithd interface refuses to make a call to >> the apcmaster to power down the node that's "dead". Reading through >> the logs I can see that during setup the stonithd asks the >> apcmastersnmp module to check it's hosts list and it returns the >> correct hostnames "tpc-dal-prlores3 tpc-dal-tcfs2". However when the >> time comes for it to actually use the device I get the following >> message from stonithd refusing to actually kill the other node. > > hmm .... the outlet names of the PDU are also uppercase? > > Regards, > Andreas > >> >> Mar 3 13:00:36 TPC-DAL-TCFS2 crmd: [15805]: info: te_fence_node: >> Executing poweroff fencing operation (24) on TPC-DAL-PRLORES3 >> (timeout=60000) >> Mar 3 13:00:36 TPC-DAL-TCFS2 crmd: [15805]: debug: waiting for the >> stonith reply msg. >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: info: client tengine >> [pid: 15805] requests a STONITH operation POWEROFF on node >> TPC-DAL-PRLORES3 >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: info: we can't manage >> TPC-DAL-PRLORES3, broadcast request to other nodes >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: debug: inserted >> optype=POWEROFF, key=-2 >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: info: Broadcasting >> the message succeeded: require others to stonith node >> TPC-DAL-PRLORES3. >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: [15800]: debug: >> stonithd_node_fence: sent back a synchronous reply. >> Mar 3 13:00:36 TPC-DAL-TCFS2 crmd: [15805]: debug: >> stonithd_node_fence:574: stonithd's synchronous answer is ST_APIOK >> >> >> The stonith is configured as follows: >> >> <clone id="fencing" > >> <primitive class="stonith" id="apcstonith23" type="apcmastersnmp" > >> <operations id="apcstonith23-operations" > >> <op id="apcstonith23-op-monitor-15" interval="15" >> name="monitor" start-delay="15" timeout="15" /> >> </operations> >> <instance_attributes id="apcstonith23-instance_attributes" > >> <nvpair id="nvpair-604e339f-a400-4b30-82c0-f046de0ed663" >> name="ipaddr" value="172.20.1.23" /> >> <nvpair id="nvpair-ed611421-97a1-4091-a5cd-8159f1230096" name="port" >> value="161" /> >> <nvpair id="nvpair-997431e2-ea78-4065-b835-f9149bbcb596" >> name="community" value="private" /> >> </instance_attributes> >> </primitive> >> <meta_attributes id="fencing-meta_attributes" > >> </meta_attributes> >> </clone> >> >> >> I can confirm the use of the stonith via the command "stonith -t >> apcmastersnmp <params> tpc-dal-prlores3" and it'll switch off the >> server. >> >> Any help would be appreciated. >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
