Hello all, I've just got an opportunity to play with a fence device: WTI NPS. I manage to test the power cycle using this command: stonith -v -t wti_nps ipaddr=192.168.0.100 password=123456 -l -T reset station8
However, I'm not really clear on how to apply Stonith into Linux-HA v2. Been digging around and found this page: http://www.linux-ha.org/ConfiguringStonithPlugins If I'm not mistaken in understanding it, in a 2-node cluster, we need to setup 2 stonith resource, with each one's job is to shoot the other node in the head? More confusion is, in what parameter/attribute can I define the "station8"? In wti_nps native resource, the mentioned parameter is just "ipaddr" and "password". Here's my related CIB: <clone id="DoFencing"> <meta_attributes id="DoFencing_meta_attrs"> <attributes> <nvpair id="DoFencing_metaattr_target_role" name="target_role" value="stopped"/> <nvpair id="DoFencing_metaattr_clone_max" name="clone_max" value="2"/> <nvpair id="DoFencing_metaattr_clone_node_max" name="clone_node_max" value="1"/> </attributes> </meta_attributes> <primitive id="resource_" class="stonith" type="wti_nps" provider="heartbeat"> <instance_attributes id="resource__instance_attrs"> <attributes> <nvpair id="babe7348-ace7-4802-b960-78b68175f00c" name="ipaddr" value="192.168.0.100"/> <nvpair id="9329362b-2213-41c3-83d8-b7aaa65c8816" name="password" value="bajau123"/> </attributes> </instance_attributes> <operations> <op id="fafbbfdc-b1c3-4d31-a87a-33c79001ccd3" name="monitor" description="fence8" interval="15" timeout="15" start_delay="15" prereq="nothing" disabled="false" role="Started" on_fail="fence"/> <op id="bbbc7128-adcf-42b8-b3d0-6180d6428207" name="start" description="fence8" timeout="15" prereq="nothing" start_delay="0" disabled="false" role="Started"/> </operations> <meta_attributes id="resource_:0_meta_attrs"> <attributes> <nvpair id="resource_:0_metaattr_target_role" name="target_role" value="started"/> </attributes> </meta_attributes> </primitive> </clone> I try to add an operation to a resource "On Fail: fence". This is what happen when I test to make the httpd resource fail by emptying httpd.conf: Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc: station5.enterprise.com Stop r_iphttp_1 Apr 2 22:31:36 station4 pengine: [5007]: notice: StartRsc: station4.enterprise.com Start r_iphttp_1 Apr 2 22:31:36 station4 pengine: [5007]: notice: RecurringOp: station4.enterprise.com r_iphttp_1_monitor_10000 Apr 2 22:31:36 station4 pengine: [5007]: notice: NoRoleChange: Move resource r_fsmount_1 (station5.enterprise.com -> station4.enterprise.com) Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc: station5.enterprise.com Stop r_fsmount_1 Apr 2 22:31:36 station4 pengine: [5007]: notice: StartRsc: station4.enterprise.com Start r_fsmount_1 Apr 2 22:31:36 station4 pengine: [5007]: notice: NoRoleChange: Recover resource r_serviceweb_1 (station4.enterprise.com) Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc: station5.enterprise.com Stop r_serviceweb_1 Apr 2 22:31:36 station4 pengine: [5007]: notice: StartRsc: station4.enterprise.com Start r_serviceweb_1 Apr 2 22:31:36 station4 pengine: [5007]: notice: RecurringOp: station4.enterprise.com r_serviceweb_1_monitor_15000 Apr 2 22:31:36 station4 pengine: [5007]: info: native_stop_constraints: resource_:1_stop_0 is implicit after station5.enterprise.com is fenced Apr 2 22:31:36 station4 pengine: [5007]: info: native_stop_constraints: Re-creating actions for DoFencing Apr 2 22:31:36 station4 pengine: [5007]: notice: NoRoleChange: Leave resource resource_:0 (station4.enterprise.com) Apr 2 22:31:36 station4 pengine: [5007]: notice: StopRsc: station5.enterprise.com Stop resource_:1 Apr 2 22:31:36 station4 crmd: [2800]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] Apr 2 22:31:36 station4 tengine: [5006]: info: unpack_graph: Unpacked transition 40: 18 actions in 18 synapses Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 18 fired and confirmed Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 25 fired and confirmed Apr 2 22:31:36 station4 pengine: [5007]: WARN: process_pe_message: Transition 40: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-31.bz2 Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 27 fired and confirmed Apr 2 22:31:36 station4 pengine: [5007]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Apr 2 22:31:36 station4 tengine: [5006]: info: te_fence_node: Executing reboot fencing operation (28) on station5.enterprise.com (timeout=30000) Apr 2 22:31:36 station4 stonithd: [2798]: info: client tengine [pid: 5006] want a STONITH operation RESET to node station5.enterprise.com. Apr 2 22:31:36 station4 stonithd: [2798]: info: Broadcasting the message succeeded: require others to stonith node station5.enterprise.com. Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 22 fired and confirmed Apr 2 22:31:36 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 26 fired and confirmed Apr 2 22:31:41 station4 stonithd: [8041]: info: Successful login to WTI Network Power Switch. Apr 2 22:33:06 station4 pengine: [5007]: WARN: process_pe_message: Transition 43: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-34.bz2 Apr 2 22:33:06 station4 pengine: [5007]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Apr 2 22:33:06 station4 tengine: [5006]: info: unpack_graph: Unpacked transition 43: 18 actions in 18 synapses Apr 2 22:33:06 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 18 fired and confirmed Apr 2 22:33:06 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 25 fired and confirmed Apr 2 22:33:06 station4 tengine: [5006]: info: te_pseudo_action: Pseudo action 27 fired and confirmed Apr 2 22:33:06 station4 tengine: [5006]: info: te_fence_node: Executing reboot fencing operation (28) on station5.enterprise.com (timeout=30000) Any insight and comments are welcome. Thank you in advance. -- Fajar Priyanto | Reg'd Linux User #327841 | Linux tutorial http://linux2.arinet.org 20:55:08 up 2:03, 2.6.22-14-generic GNU/Linux Let's use OpenOffice. http://www.openoffice.org The real challenge of teaching is getting your students motivated to learn. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
