On Thu, Feb 24, 2011 at 5:41 PM, Stallmann, Andreas <[email protected]> wrote:
> Hi again!
>
> I tried to think my setup trough again, but I'm still not coming to any 
> sensible  conclusion.
>
> The stonith:suicide ressource was set up as a clone ressource, because that's 
> how it's done in all the examples I found. Well - I didn't find a single 
> example on "suicide", but that's at least how it's done for the other suicide 
> agents.
>
> Could that be my error? Shouldn't the suicide ressource beeing stopped on all 
> nodes *with* quorum and beeing started only on the nodes, which have *no* 
> quorum? If I'm right, how is that accomplished?
>
> Strangely, according to the error messages in my logs (/var/log/messages), my 
> disconnected system (mgmt3) is trying to stonith one (yes, only one, it 
> always tries mgmt01, not mgmt02) other systems!
>
> Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_query_timeout: 
> Query f7cbd271-ffa2-4015-a132-0107517d2ea1 for mgmt01 timed out
> Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_timeout: Action 
> poweroff (f7cbd271-ffa2-4015-a132-0107517d2ea1) for mgmt01 timed out
> Feb 24 17:28:43 mgmt03 crmd: [5911]: ERROR: tengine_stonith_callback: Stonith 
> of mgmt01 failed (-7)... aborting transition.
>
> Looking at the "warn" messages, one can see, that stonith somehow likes to 
> kill *all* nodes:
>
> Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt01 
> for STONITH
> Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt02 
> for STONITH
> Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt03 
> for STONITH
>
> And "info" reveals, that stonith indeed tries to kill mgmt01:
>
> Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: log_data_element: 
> stonith_query: Query <stonith_command t="stonith-ng" 
> st_async_id="872fdb20-c172-417e-9a21-1233abc5a91a" st_op="st_query" 
> st_callid="0" st_callopt="0" st_remote_op="87     
> 2fdb20-c172-417e-9a21-1233abc5a91a" st_target="mgmt01" 
> st_device_action="poweroff" st_clientid="940dcf86-d33a-4cb     
> d-a9ea-1054af0b5e33" src="mgmt03" seq="1467" />
> Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: can_fence_host_with_device: 
> suicide_res:2 can not fence  mgmt01: dynamic-list
>
> Something is obviously going entirely wrong here...

The agent does not appear to support the 'list' command (correctly).
Try specifying pcmk_host_list="mgmt01 mgmt02 mgmt03" for suicide_res.

Suspect it will still fail though, suicide isnt a supported fencing
option - since obviously the other nodes can't confirm it happened.

> If any one of you has a functioning suicide-stonith solution running, please 
> let me know how you do it.
>
> See below for my configuration (again).
>
> Thanks in advance,
>
> Andreas
>
> ~~~~~~Output from crm configure show~~~~~~~~~~
> primitive suicide_res stonith:suicide ...
> clone fenc_clon suicide_res
> ...
> property $id="cib-bootstrap-options" \
>        dc-version="1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="3" \
>        stonith-enabled="true" \
>        no-quorum-policy="suicide" \
>        stonith-action="poweroff"
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ------------------------
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
> Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
> Höfer
> Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans 
> Jürgen Niemeier
>
> CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
> Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
> Wilfried Pütz
> Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
> Jakob
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to