On 11/29/2011 12:14 AM, Hal Martin wrote: > Sorry; they were included in the previous email but it appears it was > not properly spaced to be noticeable in the wall of text.
Indeed ... already there, sorry for the noise. strange ... where does this timeout come from? I don't see an evidence this fencing request ran for 60sec ... Did you try to provoke a fencing action without using crm shell? Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Syslog from sdgxen-3: > Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_ais_dispatch: > Update relayed from sdgxen-2 > Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_trigger_update: > Sending flush op to all hosts for: terminate (true) > Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_perform_update: > Sent update 7: terminate=true > Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node > sdgxen-2 now has id: 2065306796 > Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node > 2065306796 is now known as sdgxen-2 > Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: stonith_command: > Processed st_query from sdgxen-2: rc=0 > Nov 28 15:01:21 sdgxen-3 sbd: [442]: info: Latency: 1 > Nov 28 15:01:22 sdgxen-3 sbd: [442]: info: Latency: 1 > Nov 28 15:01:23 sdgxen-3 sbd: [442]: info: Latency: 1 > Nov 28 15:01:24 sdgxen-3 sbd: [442]: info: Latency: 1 > Nov 28 15:01:25 sdgxen-3 sbd: [442]: info: Latency: 1 > Nov 28 15:01:26 sdgxen-3 sbd: [442]: info: Latency: 1 > Nov 28 15:01:14 sdgxen-3 stonith-ng: [452]: info: stonith_command: > Processed st_query from sdgxen-2: rc=0 > > Thanks, > -Hal > > On Mon, Nov 28, 2011 at 6:10 PM, Andreas Kurz <[email protected]> wrote: >> On 11/28/2011 08:07 PM, Hal Martin wrote: >>> Thank you for the updated link. >>> >>> I have recompiled pacemaker from checkout b9889764 and stonith still >>> fails to shoot nodes. >> >> Maybe posting also the logs from sdgxen-3 can help. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >>> >>> sdgxen-2:/ # crm node fence sdgxen-3 >>> Do you really want to shoot sdgxen-3? y >>> >>> Syslog from sdgxen-2: >>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: pe_fence_node: Node >>> sdgxen-3 will be fenced because termination was requested >>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: >>> determine_online_status: Node sdgxen-3 is unclean >>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: stage6: Scheduling Node >>> sdgxen-3 for STONITH >>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: notice: LogActions: Leave >>> stonith-sbd(Started sdgxen-2) >>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: do_state_transition: State >>> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS >>> cause=C_IPC_MESSAGE origin=handle_response ] >>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: unpack_graph: Unpacked >>> transition 4: 4 actions in 4 synapses >>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: do_te_invoke: Processing >>> graph 4 (ref=pe_calc-dc-1322492480-29) derived from >>> /var/lib/pengine/pe-warn-1278.bz2 >>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: te_pseudo_action: Pseudo >>> action 5 fired and confirmed >>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: te_fence_node: Executing >>> reboot fencing operation (8) on sdgxen-3 (timeout=60000) >>> Nov 28 15:01:20 sdgxen-2 stonith-ng: [452]: info: >>> initiate_remote_stonith_op: Initiating remote operation reboot for >>> sdgxen-3: 76727be7-eecb-4778-857c-1a9288c63ee6 >>> Nov 28 15:01:20 sdgxen-2 stonith-ng: [452]: info: >>> can_fence_host_with_device: stonith-sbd can not fence sdgxen-3: >>> dynamic-list >>> Nov 28 15:01:20 sdgxen-2 stonith-ng: [452]: info: stonith_command: >>> Processed st_query from sdgxen-2: rc=0 >>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: process_pe_message: >>> Transition 4: WARNINGs found during PE processing. PEngine Input >>> stored in: /var/lib/pengine/pe-warn-1278.bz2 >>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: notice: process_pe_message: >>> Configuration WARNINGs found during PE processing. Please run >>> "crm_verify -L" to identify issues. >>> Nov 28 15:01:26 sdgxen-2 stonith-ng: [452]: ERROR: >>> remote_op_query_timeout: Query 76727be7-eecb-4778-857c-1a9288c63ee6 >>> for sdgxen-3 timed outNov 28 15:01:26 sdgxen-2 stonith-ng: [452]: >>> ERROR: remote_op_timeout: Action reboot >>> (76727be7-eecb-4778-857c-1a9288c63ee6) for sdgxen-3 timed outNov 28 >>> 15:01:26 sdgxen-2 stonith-ng: [452]: info: remote_op_done: Notifing >>> clients of 76727be7-eecb-4778-857c-1a9288c63ee6 (reboot of sdgxen-3 >>> from ee8c34db-0e5d-4227-aa46-0ad8b3f306d1 by (null)): 0, rc=-8Nov 28 >>> 15:01:26 sdgxen-2 stonith-ng: [452]: info: stonith_notify_client: >>> Sending st_fence-notification to client >>> 457/67849bf4-1881-48b9-a5e8-ab1f72116a81Nov 28 15:01:26 sdgxen-2 crmd: >>> [457]: info: tengine_stonith_callback: StonithOp <remote-op state="0" >>> st_target="sdgxen-3" st_op="reboot" />Nov 28 15:01:26 sdgxen-2 crmd: >>> [457]: info: tengine_stonith_callback: Stonith operation >>> 2/8:4:0:bd203590-3295-4f31-a720-01760a5394e8: Operation timed out >>> (-8)Nov 28 15:01:26 sdgxen-2 crmd: [457]: ERROR: >>> tengine_stonith_callback: Stonith of sdgxen-3 failed (-8)... aborting >>> transition.Nov 28 15:01:26 sdgxen-2 crmd: [457]: info: >>> abort_transition_graph: tengine_stonith_callback:454 - Triggered >>> transition abort (complete=0) : Stonith failedNov 28 15:01:26 sdgxen-2 >>> crmd: [457]: info: update_abort_priority: Abort priority upgraded from >>> 0 to 1000000Nov 28 15:01:26 sdgxen-2 crmd: [457]: info: >>> update_abort_priority: Abort action done superceeded by restartNov 28 >>> 15:01:26 sdgxen-2 crmd: [457]: ERROR: tengine_stonith_notify: Peer >>> sdgxen-3 could not be terminated (reboot) by <anyone> for sdgxen-2 >>> (ref=76727be7-eecb-4778-857c-1a9288c63ee6): Operation timed outNov 28 >>> 15:01:26 sdgxen-2 crmd: [457]: info: run_graph: >>> ====================================================Nov 28 15:01:26 >>> sdgxen-2 crmd: [457]: notice: run_graph: Transition 4 (Complete=2, >>> Pending=0, Fired=0, Skipped=2, Incomplete=0, >>> Source=/var/lib/pengine/pe-warn-1278.bz2): StoppedNov 28 15:01:26 >>> sdgxen-2 crmd: [457]: info: te_graph_trigger: Transition 4 is now >>> completeNov 28 15:01:26 sdgxen-2 crmd: [457]: info: >>> do_state_transition: State transition S_TRANSITION_ENGINE -> >>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL >>> origin=notify_crmd ]Nov 28 15:01:26 sdgxen-2 crmd: [457]: info: >>> do_state_transition: All 2 cluster nodes are eligible to run >>> resources.Nov 28 15:01:26 sdgxen-2 crmd: [457]: info: do_pe_invoke: >>> Query 81: Requesting the current CIB: S_POLICY_ENGINENov 28 15:01:26 >>> sdgxen-2 crmd: [457]: info: do_pe_invoke_callback: Invoking the PE: >>> query=81, ref=pe_calc-dc-1322492486-30, seq=240, quorate=1Nov 28 >>> 15:01:26 sdgxen-2 pengine: [456]: WARN: pe_fence_node: Node sdgxen-3 >>> will be fenced because termination was requestedNov 28 15:01:26 >>> sdgxen-2 pengine: [456]: WARN: determine_online_status: Node sdgxen-3 >>> is uncleanNov 28 15:01:26 sdgxen-2 pengine: [456]: WARN: stage6: >>> Scheduling Node sdgxen-3 for STONITHNov 28 15:01:26 sdgxen-2 pengine: >>> [456]: notice: LogActions: Leave stonith-sbd(Started sdgxen-2) >>> Syslog from sdgxen-3: >>> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_ais_dispatch: >>> Update relayed from sdgxen-2 >>> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: terminate (true) >>> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_perform_update: >>> Sent update 7: terminate=true >>> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node >>> sdgxen-2 now has id: 2065306796 >>> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node >>> 2065306796 is now known as sdgxen-2 >>> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: stonith_command: >>> Processed st_query from sdgxen-2: rc=0 >>> Nov 28 15:01:21 sdgxen-3 sbd: [442]: info: Latency: 1 >>> Nov 28 15:01:22 sdgxen-3 sbd: [442]: info: Latency: 1 >>> Nov 28 15:01:23 sdgxen-3 sbd: [442]: info: Latency: 1 >>> Nov 28 15:01:24 sdgxen-3 sbd: [442]: info: Latency: 1 >>> Nov 28 15:01:25 sdgxen-3 sbd: [442]: info: Latency: 1 >>> Nov 28 15:01:26 sdgxen-3 sbd: [442]: info: Latency: 1 >>> Nov 28 15:01:14 sdgxen-3 stonith-ng: [452]: info: stonith_command: >>> Processed st_query from sdgxen-2: rc=0 >>> >>> sdgxen-2:/ # crm_verify -L -V >>> crm_verify[572]: 2011/11/28_15:06:01 WARN: pe_fence_node: Node >>> sdgxen-3 will be fenced because termination was requested >>> crm_verify[572]: 2011/11/28_15:06:01 WARN: determine_online_status: >>> Node sdgxen-3 is unclean >>> crm_verify[572]: 2011/11/28_15:06:01 WARN: stage6: Scheduling Node >>> sdgxen-3 for STONITH >>> Warnings found during check: config may not be valid >>> >>> sdgxen-2:/ # crm configure show >>> node sdgxen-2 >>> node sdgxen-3 >>> primitive stonith-sbd stonith:external/sbd \ >>> meta is-managed="true" target-role="Started" >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.1.6-git" \ >>> cluster-infrastructure="openais" \ >>> stonith-enabled="true" \ >>> stonith-timeout="60s" \ >>> stonith-action="reboot" \ >>> expected-quorum-votes="2" >>> >>> I appreciate any feedback on this issue. >>> >>> Thanks, >>> Hal >>> >>> On Mon, Nov 28, 2011 at 10:44 AM, Florian Haas <[email protected]> wrote: >>>> On Mon, Nov 28, 2011 at 4:35 PM, Hal Martin <[email protected]> wrote: >>>>> Looking at the mercurial repository for pacemaker >>>>> (http://hg.clusterlabs.org/pacemaker/) I do not see any check-ins >>>>> since 1.1.6 was tagged two months ago. >>>> >>>> Pacemaker has since moved to GitHub: >>>> >>>> https://github.com/ClusterLabs/pacemaker >>>> >>>> Hope this helps. >>>> >>>> Cheers, >>>> Florian >>>> >>>> -- >>>> Need help with Pacemaker? >>>> http://www.hastexo.com/knowledge/pacemaker >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
