On 11/29/2011 12:14 AM, Hal Martin wrote:
> Sorry; they were included in the previous email but it appears it was
> not properly spaced to be noticeable in the wall of text.

Indeed ... already there, sorry for the noise.

strange ... where does this timeout come from? I don't see an evidence
this fencing request ran for 60sec ...

Did you try to provoke a fencing action without using crm shell?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Syslog from sdgxen-3:
> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_ais_dispatch:
> Update relayed from sdgxen-2
> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: terminate (true)
> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_perform_update:
> Sent update 7: terminate=true
> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node
> sdgxen-2 now has id: 2065306796
> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node
> 2065306796 is now known as sdgxen-2
> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: stonith_command:
> Processed st_query from sdgxen-2: rc=0
> Nov 28 15:01:21 sdgxen-3 sbd: [442]: info: Latency: 1
> Nov 28 15:01:22 sdgxen-3 sbd: [442]: info: Latency: 1
> Nov 28 15:01:23 sdgxen-3 sbd: [442]: info: Latency: 1
> Nov 28 15:01:24 sdgxen-3 sbd: [442]: info: Latency: 1
> Nov 28 15:01:25 sdgxen-3 sbd: [442]: info: Latency: 1
> Nov 28 15:01:26 sdgxen-3 sbd: [442]: info: Latency: 1
> Nov 28 15:01:14 sdgxen-3 stonith-ng: [452]: info: stonith_command:
> Processed st_query from sdgxen-2: rc=0
> 
> Thanks,
> -Hal
> 
> On Mon, Nov 28, 2011 at 6:10 PM, Andreas Kurz <[email protected]> wrote:
>> On 11/28/2011 08:07 PM, Hal Martin wrote:
>>> Thank you for the updated link.
>>>
>>> I have recompiled pacemaker from checkout b9889764 and stonith still
>>> fails to shoot nodes.
>>
>> Maybe posting also the logs from sdgxen-3 can help.
>>
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>> sdgxen-2:/ # crm node fence sdgxen-3
>>> Do you really want to shoot sdgxen-3? y
>>>
>>> Syslog from sdgxen-2:
>>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: pe_fence_node: Node
>>> sdgxen-3 will be fenced because termination was requested
>>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN:
>>> determine_online_status: Node sdgxen-3 is unclean
>>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: stage6: Scheduling Node
>>> sdgxen-3 for STONITH
>>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: notice: LogActions: Leave
>>> stonith-sbd(Started sdgxen-2)
>>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: do_state_transition: State
>>> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
>>> cause=C_IPC_MESSAGE origin=handle_response ]
>>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: unpack_graph: Unpacked
>>> transition 4: 4 actions in 4 synapses
>>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: do_te_invoke: Processing
>>> graph 4 (ref=pe_calc-dc-1322492480-29) derived from
>>> /var/lib/pengine/pe-warn-1278.bz2
>>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: te_pseudo_action: Pseudo
>>> action 5 fired and confirmed
>>> Nov 28 15:01:20 sdgxen-2 crmd: [457]: info: te_fence_node: Executing
>>> reboot fencing operation (8) on sdgxen-3 (timeout=60000)
>>> Nov 28 15:01:20 sdgxen-2 stonith-ng: [452]: info:
>>> initiate_remote_stonith_op: Initiating remote operation reboot for
>>> sdgxen-3: 76727be7-eecb-4778-857c-1a9288c63ee6
>>> Nov 28 15:01:20 sdgxen-2 stonith-ng: [452]: info:
>>> can_fence_host_with_device: stonith-sbd can not fence sdgxen-3:
>>> dynamic-list
>>> Nov 28 15:01:20 sdgxen-2 stonith-ng: [452]: info: stonith_command:
>>> Processed st_query from sdgxen-2: rc=0
>>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: WARN: process_pe_message:
>>> Transition 4: WARNINGs found during PE processing. PEngine Input
>>> stored in: /var/lib/pengine/pe-warn-1278.bz2
>>> Nov 28 15:01:20 sdgxen-2 pengine: [456]: notice: process_pe_message:
>>> Configuration WARNINGs found during PE processing.  Please run
>>> "crm_verify -L" to identify issues.
>>> Nov 28 15:01:26 sdgxen-2 stonith-ng: [452]: ERROR:
>>> remote_op_query_timeout: Query 76727be7-eecb-4778-857c-1a9288c63ee6
>>> for sdgxen-3 timed outNov 28 15:01:26 sdgxen-2 stonith-ng: [452]:
>>> ERROR: remote_op_timeout: Action reboot
>>> (76727be7-eecb-4778-857c-1a9288c63ee6) for sdgxen-3 timed outNov 28
>>> 15:01:26 sdgxen-2 stonith-ng: [452]: info: remote_op_done: Notifing
>>> clients of 76727be7-eecb-4778-857c-1a9288c63ee6 (reboot of sdgxen-3
>>> from ee8c34db-0e5d-4227-aa46-0ad8b3f306d1 by (null)): 0, rc=-8Nov 28
>>> 15:01:26 sdgxen-2 stonith-ng: [452]: info: stonith_notify_client:
>>> Sending st_fence-notification to client
>>> 457/67849bf4-1881-48b9-a5e8-ab1f72116a81Nov 28 15:01:26 sdgxen-2 crmd:
>>> [457]: info: tengine_stonith_callback: StonithOp <remote-op state="0"
>>> st_target="sdgxen-3" st_op="reboot" />Nov 28 15:01:26 sdgxen-2 crmd:
>>> [457]: info: tengine_stonith_callback: Stonith operation
>>> 2/8:4:0:bd203590-3295-4f31-a720-01760a5394e8: Operation timed out
>>> (-8)Nov 28 15:01:26 sdgxen-2 crmd: [457]: ERROR:
>>> tengine_stonith_callback: Stonith of sdgxen-3 failed (-8)... aborting
>>> transition.Nov 28 15:01:26 sdgxen-2 crmd: [457]: info:
>>> abort_transition_graph: tengine_stonith_callback:454 - Triggered
>>> transition abort (complete=0) : Stonith failedNov 28 15:01:26 sdgxen-2
>>> crmd: [457]: info: update_abort_priority: Abort priority upgraded from
>>> 0 to 1000000Nov 28 15:01:26 sdgxen-2 crmd: [457]: info:
>>> update_abort_priority: Abort action done superceeded by restartNov 28
>>> 15:01:26 sdgxen-2 crmd: [457]: ERROR: tengine_stonith_notify: Peer
>>> sdgxen-3 could not be terminated (reboot) by <anyone> for sdgxen-2
>>> (ref=76727be7-eecb-4778-857c-1a9288c63ee6): Operation timed outNov 28
>>> 15:01:26 sdgxen-2 crmd: [457]: info: run_graph:
>>> ====================================================Nov 28 15:01:26
>>> sdgxen-2 crmd: [457]: notice: run_graph: Transition 4 (Complete=2,
>>> Pending=0, Fired=0, Skipped=2, Incomplete=0,
>>> Source=/var/lib/pengine/pe-warn-1278.bz2): StoppedNov 28 15:01:26
>>> sdgxen-2 crmd: [457]: info: te_graph_trigger: Transition 4 is now
>>> completeNov 28 15:01:26 sdgxen-2 crmd: [457]: info:
>>> do_state_transition: State transition S_TRANSITION_ENGINE ->
>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
>>> origin=notify_crmd ]Nov 28 15:01:26 sdgxen-2 crmd: [457]: info:
>>> do_state_transition: All 2 cluster nodes are eligible to run
>>> resources.Nov 28 15:01:26 sdgxen-2 crmd: [457]: info: do_pe_invoke:
>>> Query 81: Requesting the current CIB: S_POLICY_ENGINENov 28 15:01:26
>>> sdgxen-2 crmd: [457]: info: do_pe_invoke_callback: Invoking the PE:
>>> query=81, ref=pe_calc-dc-1322492486-30, seq=240, quorate=1Nov 28
>>> 15:01:26 sdgxen-2 pengine: [456]: WARN: pe_fence_node: Node sdgxen-3
>>> will be fenced because termination was requestedNov 28 15:01:26
>>> sdgxen-2 pengine: [456]: WARN: determine_online_status: Node sdgxen-3
>>> is uncleanNov 28 15:01:26 sdgxen-2 pengine: [456]: WARN: stage6:
>>> Scheduling Node sdgxen-3 for STONITHNov 28 15:01:26 sdgxen-2 pengine:
>>> [456]: notice: LogActions: Leave   stonith-sbd(Started sdgxen-2)
>>> Syslog from sdgxen-3:
>>> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_ais_dispatch:
>>> Update relayed from sdgxen-2
>>> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_trigger_update:
>>> Sending flush op to all hosts for: terminate (true)
>>> Nov 28 15:01:20 sdgxen-3 attrd: [455]: notice: attrd_perform_update:
>>> Sent update 7: terminate=true
>>> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node
>>> sdgxen-2 now has id: 2065306796
>>> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: crm_new_peer: Node
>>> 2065306796 is now known as sdgxen-2
>>> Nov 28 15:01:20 sdgxen-3 stonith-ng: [452]: info: stonith_command:
>>> Processed st_query from sdgxen-2: rc=0
>>> Nov 28 15:01:21 sdgxen-3 sbd: [442]: info: Latency: 1
>>> Nov 28 15:01:22 sdgxen-3 sbd: [442]: info: Latency: 1
>>> Nov 28 15:01:23 sdgxen-3 sbd: [442]: info: Latency: 1
>>> Nov 28 15:01:24 sdgxen-3 sbd: [442]: info: Latency: 1
>>> Nov 28 15:01:25 sdgxen-3 sbd: [442]: info: Latency: 1
>>> Nov 28 15:01:26 sdgxen-3 sbd: [442]: info: Latency: 1
>>> Nov 28 15:01:14 sdgxen-3 stonith-ng: [452]: info: stonith_command:
>>> Processed st_query from sdgxen-2: rc=0
>>>
>>> sdgxen-2:/ # crm_verify -L -V
>>> crm_verify[572]: 2011/11/28_15:06:01 WARN: pe_fence_node: Node
>>> sdgxen-3 will be fenced because termination was requested
>>> crm_verify[572]: 2011/11/28_15:06:01 WARN: determine_online_status:
>>> Node sdgxen-3 is unclean
>>> crm_verify[572]: 2011/11/28_15:06:01 WARN: stage6: Scheduling Node
>>> sdgxen-3 for STONITH
>>> Warnings found during check: config may not be valid
>>>
>>> sdgxen-2:/ # crm configure show
>>> node sdgxen-2
>>> node sdgxen-3
>>> primitive stonith-sbd stonith:external/sbd \
>>>         meta is-managed="true" target-role="Started"
>>> property $id="cib-bootstrap-options" \
>>>         dc-version="1.1.6-git" \
>>>         cluster-infrastructure="openais" \
>>>         stonith-enabled="true" \
>>>         stonith-timeout="60s" \
>>>         stonith-action="reboot" \
>>>         expected-quorum-votes="2"
>>>
>>> I appreciate any feedback on this issue.
>>>
>>> Thanks,
>>> Hal
>>>
>>> On Mon, Nov 28, 2011 at 10:44 AM, Florian Haas <[email protected]> wrote:
>>>> On Mon, Nov 28, 2011 at 4:35 PM, Hal Martin <[email protected]> wrote:
>>>>> Looking at the mercurial repository for pacemaker
>>>>> (http://hg.clusterlabs.org/pacemaker/) I do not see any check-ins
>>>>> since 1.1.6 was tagged two months ago.
>>>>
>>>> Pacemaker has since moved to GitHub:
>>>>
>>>> https://github.com/ClusterLabs/pacemaker
>>>>
>>>> Hope this helps.
>>>>
>>>> Cheers,
>>>> Florian
>>>>
>>>> --
>>>> Need help with Pacemaker?
>>>> http://www.hastexo.com/knowledge/pacemaker
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to