Hi,

On Wed, Feb 04, 2009 at 10:35:03AM -0800, Gruher, Joseph R wrote:
> Hello again.  We have not received any response so I thought I
> would send this once more.  Can anyone provide any guidance for
> us?  Is there a way to get more output from the STONITH daemon
> to help us determine why it might not issue the reset command
> to our plugin script?

Hope that the other reply helped.

Thanks,

Dejan

P.S. A bit hectic lately, hence late responses.

> Thanks much,
> Joe
> 
> ________________________________
> From: Gruher, Joseph R
> Sent: Monday, February 02, 2009 3:26 PM
> To: General Linux-HA mailing list
> Cc: Gruher, Joseph R; Liu, Zheng-yang
> Subject: Help with STONITH Plugin
> 
> Hello-
> 
> We are developing our own STONITH plugin for our blade server and having an 
> issue we are hoping this list can help us with.  Our STONITH plugin script 
> works well some of the time (the bad node is reset and the resources fail 
> over) but does not work some of the time (resources never fail over).  We 
> have been looking in the messages log (see examples below) and it appears in 
> the non-working case that the reset call to our plugin never occurs, even 
> though the getconfignames, status and gethosts calls that normally lead up to 
> it are made.  Are there any common problem that could cause this behavior?  
> Any suggestions how we can continue to debug this problem?  Other logs we 
> should be looking in?  Would it be useful to send our plugin or any other 
> files from the system?
> 
> Thanks very much for any and all input.  We are testing with SLES10.2 x64.
> 
> -Joe
> 
> 
> When it works:
> 
> Jan 28 11:33:09 25node2 crmd: [6325]: info: do_lrm_rsc_op: Performing 
> op=STONITH_v1.1_start_0 key=16:47:676b4bb5-523b-49c5-a5f3-a228f2af8149)
> Jan 28 11:33:09 25node2 lrmd: [6322]: info: rsc:STONITH_v1.1: start
> Jan 28 11:33:09 25node2 lrmd: [14675]: info: Try to start STONITH resource 
> <rsc_id=STONITH_v1.1> : Device=external/MFSYS_STONITH_PLUGIN_v1.1
> Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14676]: getconfignames 
> (node2slot=; slot=)
> Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14676]: exiting script with 
> an rc=0
> Jan 28 11:33:09 25node2 MFSYS_STONITH_PLUGIN_v1.1[14687]: status 
> (node2slot=25node1=1,25node2=2,25node3=3; slot=)
> Jan 28 11:33:09 25node2 ccm: [6320]: debug: quorum plugin: majority
> Jan 28 11:33:09 25node2 ccm: [6320]: debug: cluster:linux-ha, member_count=2, 
> member_quorum_votes=200
> Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Jan 28 11:33:09 25node2 ccm: [6320]: debug: total_node_count=3, 
> total_quorum_votes=300
> Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: no mbr_track info
> Jan 28 11:33:09 25node2 ccm: [6320]: debug: quorum plugin: majority
> Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Jan 28 11:33:09 25node2 ccm: [6320]: debug: cluster:linux-ha, member_count=2, 
> member_quorum_votes=200
> Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Jan 28 11:33:09 25node2 cib: [6321]: info: mem_handle_event: instance=16, 
> nodes=2, new=0, lost=1, n_idx=0, new_idx=2, old_idx=5
> Jan 28 11:33:09 25node2 ccm: [6320]: debug: total_node_count=3, 
> total_quorum_votes=300
> Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: no mbr_track 
> info
> Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: LOST: 25node1
> Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: PEER: 25node2
> Jan 28 11:33:09 25node2 crmd: [6325]: info: mem_handle_event: instance=16, 
> nodes=2, new=0, lost=1, n_idx=0, new_idx=2, old_idx=5
> Jan 28 11:33:09 25node2 cib: [6321]: info: cib_ccm_msg_callback: PEER: 25node3
> Jan 28 11:33:09 25node2 crmd: [6325]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=16)
> Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail: NEW MEMBERSHIP: 
> trans=16, nodes=2, new=0, lost=1 n_idx=0, new_idx=2, old_idx=5
> Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail:     CURRENT: 
> 25node2 [nodeid=1, born=2]
> Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail:     CURRENT: 
> 25node3 [nodeid=2, born=3]
> Jan 28 11:33:09 25node2 crmd: [6325]: info: ccm_event_detail:     LOST:    
> 25node1 [nodeid=0, born=15]
> Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping 
> WebServer_monitor_0 (rc=1) on 25node2 to an ERROR
> Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping 
> WebServer_monitor_0 (rc=1) on 25node3 to an ERROR
> Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping 
> WebServer_monitor_0 (rc=1) on 25node2 to an ERROR
> Jan 28 11:33:09 25node2 mgmtd: [6326]: ERROR: unpack_rsc_op: Remapping 
> WebServer_monitor_0 (rc=1) on 25node3 to an ERROR
> Jan 28 11:33:10 25node2 haclient: on_event: from message queue: 
> evt:cib_changed
> Jan 28 11:33:11 25node2 MFSYS_STONITH_PLUGIN_v1.1[14687]: exiting script with 
> an rc=0
> Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14703]: gethosts 
> (node2slot=25node1=1,25node2=2,25node3=3; slot=)
> Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14703]: exiting script with 
> an rc=0
> Jan 28 11:33:12 25node2 crmd: [6325]: info: process_lrm_event: LRM operation 
> STONITH_v1.1_start_0 (call=24, rc=0) complete
> Jan 28 11:33:12 25node2 tengine: [6860]: info: match_graph_event: Action 
> STONITH_v1.1_start_0 (16) confirmed on 25node2 (rc=0)
> Jan 28 11:33:12 25node2 tengine: [6860]: info: te_pseudo_action: Pseudo 
> action 17 fired and confirmed
> Jan 28 11:33:12 25node2 tengine: [6860]: info: te_fence_node: Executing 
> reboot fencing operation (18) on 25node1 (timeout=50000)
> Jan 28 11:33:12 25node2 haclient: on_event:evt:cib_changed
> Jan 28 11:33:12 25node2 stonithd: [6323]: info: client tengine [pid: 6860] 
> want a STONITH operation RESET to node 25node1.
> Jan 28 11:33:12 25node2 stonithd: [6323]: info: 
> stonith_operate_locally::2368: sending fencing op (RESET) for 25node1 to 
> device external (rsc_id=STONITH_v1.1, pid=14713)
> Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: reset 25node1 
> (node2slot=25node1=1,25node2=2,25node3=3; slot=1)
> Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: retry=0 
> bladeState=-1 powerstate=-32
> Jan 28 11:33:12 25node2 MFSYS_STONITH_PLUGIN_v1.1[14714]: exiting script with 
> an rc=0
> 
> 
> When it fails:
> 
> Jan 30 15:54:27 vs-cb03-5cl crmd: [5116]: info: do_lrm_rsc_op: Performing 
> op=CBSTONITH_monitor_0 key=4:8:469db639-f549-4582-9735-2b5e89d147c2)
> Jan 30 15:54:27 vs-cb03-5cl lrmd: [5113]: info: rsc:CBSTONITH: monitor
> Jan 30 15:54:27 vs-cb03-5cl crmd: [5116]: info: process_lrm_event: LRM 
> operation CBSTONITH_monitor_0 (call=20, rc=7) complete
> Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: write_cib_contents: Wrote 
> version 0.1175.3 of the CIB to disk (digest: 73c66f86b10cd0b1bc58a3beab870faa)
> Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:27 vs-cb03-5cl cib: [30766]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> Jan 30 15:54:28 vs-cb03-5cl crmd: [5116]: info: do_lrm_rsc_op: Performing 
> op=CBSTONITH_start_0 key=19:8:469db639-f549-4582-9735-2b5e89d147c2)
> Jan 30 15:54:28 vs-cb03-5cl lrmd: [5113]: info: rsc:CBSTONITH: start
> Jan 30 15:54:28 vs-cb03-5cl lrmd: [30768]: info: Try to start STONITH 
> resource <rsc_id=CBSTONITH> : Device=external/MFSYS_STONITH_PLUGIN_v1.1
> Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: write_cib_contents: Wrote 
> version 0.1175.5 of the CIB to disk (digest: f2c35efbf0c7066043202d754cf607bb)
> Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:28 vs-cb03-5cl cib: [30767]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30769]: getconfignames 
> (node2slot=; slot=)
> Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30769]: exiting script 
> with an rc=0
> Jan 30 15:54:29 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30780]: status 
> (node2slot=vs-cb03-2cl=2,vs-cb03-5cl=5,vs-cb03-6cl=6; slot=)
> Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: write_cib_contents: Wrote 
> version 0.1175.7 of the CIB to disk (digest: 551519e2dfb361ccf0f4303167997435)
> Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jan 30 15:54:29 vs-cb03-5cl cib: [30795]: info: retrieveCib: Reading cluster 
> configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: 
> /var/lib/heartbeat/crm/cib.xml.sig.last)
> Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30780]: exiting script 
> with an rc=0
> Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: gethosts 
> (node2slot=vs-cb03-2cl=2,vs-cb03-5cl=5,vs-cb03-6cl=6; slot=)
> Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: JOE02: hostlist 
> = vs-cb03-2cl vs-cb03-5cl vs-cb03-6cl
> Jan 30 15:54:31 vs-cb03-5cl MFSYS_STONITH_PLUGIN_v1.1[30797]: exiting script 
> with an rc=0
> Jan 30 15:54:31 vs-cb03-5cl crmd: [5116]: info: process_lrm_event: LRM 
> operation CBSTONITH_start_0 (call=21, rc=0) complete
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to