On Thu, Feb 24, 2011 at 5:41 PM, Stallmann, Andreas <[email protected]> wrote: > Hi again! > > I tried to think my setup trough again, but I'm still not coming to any > sensible conclusion. > > The stonith:suicide ressource was set up as a clone ressource, because that's > how it's done in all the examples I found. Well - I didn't find a single > example on "suicide", but that's at least how it's done for the other suicide > agents. > > Could that be my error? Shouldn't the suicide ressource beeing stopped on all > nodes *with* quorum and beeing started only on the nodes, which have *no* > quorum? If I'm right, how is that accomplished? > > Strangely, according to the error messages in my logs (/var/log/messages), my > disconnected system (mgmt3) is trying to stonith one (yes, only one, it > always tries mgmt01, not mgmt02) other systems! > > Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_query_timeout: > Query f7cbd271-ffa2-4015-a132-0107517d2ea1 for mgmt01 timed out > Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_timeout: Action > poweroff (f7cbd271-ffa2-4015-a132-0107517d2ea1) for mgmt01 timed out > Feb 24 17:28:43 mgmt03 crmd: [5911]: ERROR: tengine_stonith_callback: Stonith > of mgmt01 failed (-7)... aborting transition. > > Looking at the "warn" messages, one can see, that stonith somehow likes to > kill *all* nodes: > > Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt01 > for STONITH > Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt02 > for STONITH > Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt03 > for STONITH > > And "info" reveals, that stonith indeed tries to kill mgmt01: > > Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: log_data_element: > stonith_query: Query <stonith_command t="stonith-ng" > st_async_id="872fdb20-c172-417e-9a21-1233abc5a91a" st_op="st_query" > st_callid="0" st_callopt="0" st_remote_op="87 > 2fdb20-c172-417e-9a21-1233abc5a91a" st_target="mgmt01" > st_device_action="poweroff" st_clientid="940dcf86-d33a-4cb > d-a9ea-1054af0b5e33" src="mgmt03" seq="1467" /> > Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: can_fence_host_with_device: > suicide_res:2 can not fence mgmt01: dynamic-list > > Something is obviously going entirely wrong here...
The agent does not appear to support the 'list' command (correctly). Try specifying pcmk_host_list="mgmt01 mgmt02 mgmt03" for suicide_res. Suspect it will still fail though, suicide isnt a supported fencing option - since obviously the other nodes can't confirm it happened. > If any one of you has a functioning suicide-stonith solution running, please > let me know how you do it. > > See below for my configuration (again). > > Thanks in advance, > > Andreas > > ~~~~~~Output from crm configure show~~~~~~~~~~ > primitive suicide_res stonith:suicide ... > clone fenc_clon suicide_res > ... > property $id="cib-bootstrap-options" \ > dc-version="1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="3" \ > stonith-enabled="true" \ > no-quorum-policy="suicide" \ > stonith-action="poweroff" > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ------------------------ > CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. > Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) > Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke > Höfer > Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans > Jürgen Niemeier > > CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef. > Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 ) > Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), > Wilfried Pütz > Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd > Jakob > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
