Re: [Linux-HA] R2 Two-node apache cluster with STONITH

Bjorn Oglefjorn Mon, 09 Apr 2007 13:20:31 -0700

I quickly put together a STONITH plugin for testing this.  It conforms to
the heartbeat spec and always lies to heartbeat returning success no matter
what.  With this plugin in place I'm still getting this error:


Apr  9 15:40:47 test-2 stonithd: [8791]: info: Failed to STONITH the node
test-1.domain: optype=1, op_result=2
Apr  9 15:40:47 test-2 tengine: [8803]: info: tengine_stonith_callback:
callbacks.c call=-4, optype=1, node_name=test-1.domain, result=2,
node_list=, action=13;5:6eaeba12-87c3-465e-98f1-78585e71e495
Apr  9 15:40:47 test-2 tengine: [8803]: ERROR: tengine_stonith_callback:
callbacks.c Stonith of test-1.domain failed (2)... aborting transition.

--BO


On 4/9/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:


Any ideas?
--BO

On 4/4/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
>
> I do not know what op_result=2 means.  I can only say that the drac4 RA
> will never have exit code 2.  I am sure that the drac4 RA works as expected
> in all use cases and also when called via the stonith command from the
> command line.
>
> *Timeouts and Intervals*:  I've increased these per your
> recommendations.  Please tell me if they are better now.
> Constraints: That definitely makes more sense and I've updated to
> reflect this.
>
> As for the "cleaned up logs", the attached logs are full and have debug
> information.  The only "cleaning" performed was to remove any sensitive
> company information.  This is mandated and I must do so.
>
> Attached you will find ha.cf, heartbeat logs (with debug) from both
> nodes, the output of cibadmin -Ql at serveral points of execution, the
> output of debug logging from the drac4 RA.
>
> I hope that this will be helpful.  Thanks for the continued help.
> --BO
>
> On 4/4/07, Dejan Muhamedagic <[EMAIL PROTECTED] > wrote:
> >
> > On Tue, Apr 03, 2007 at 03:52:37PM -0400, Bjorn Oglefjorn wrote:
> > > Sorry Alan, I realize that this post is getting quite long.  Here is
> > a run
> > > down of where I am currently.
> > >
> > > STONITH is failing and I'm still not sure why.
> >
> > Me neither. There's nothing in the logs apart from:
> >
> > Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH
> > the node test-2.  domain: optype=1, op_result=2
> >
> > What does op_result=2 mean?
> >
> > I think I've already suggested that you do some debug output in
> > your (you implemented, right?) stonith RA.
> >
> > This is also no good:
> >
> > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:
> > native.c Resource stonith::external/drac4:test-1_DRAC appears to be
> > active on 2 nodes.
> >
> > You sure that drac4 always returns good status?
> >
> > Timeouts: They are really way too short. I think that there was
> > already a discussion on the list about that. Normally, short
> > timeouts will work, but it may, and most probably it will, happen
> > that they don't. Then you'll wonder what the heck went wrong.
> >
> > Intervals: Too often for my taste as I think I already said, but
> > hey, it's your cluster :)
> >
> > Constraints: I'm not sure if you want to allow test-1_DRAC to run
> > on test-1. Setting score to INFINITY won't do if the prefered node
> > is down. It'd be better and more logical to go the other way
> > around: set -INFINITY for the nodes where it shouldn't run.
> >
> > And please go on and read again
> > http://linux-ha.org/ReportingProblems
> > It says, in a Bold type:
> >
> >     Please don't send "cleaned up" logs.
> >
> > And there ain't no running CIB either.
> >
> > > I'm quite sure the plugin is
> > > working properly and my cib.xml seems sane enough.  I'm having a
> > hard time
> > > digesting the logs and understanding where the trouble is coming
> > from.
> > > Attached again are my cib.xml and the logs from when heartbeat
> > noticed
> > > test-2 was dead all the way up to STONITH failing for the second
> > time...it
> > > loops and tries to STONTIH the dead node indefinitely.  It almost
> > seems as
> > > if heartbeat is 'restarting', but like I said, I'm having a hard
> > time
> > > getting useful information from the logs.
> > >
> > > Sorry if I'm being a pest, but I really don't like not figuring
> > stuff out.
> > > Let me know if you need more info.
> > > --BO
> > >
> > > On 4/3/07, Alan Robertson < [EMAIL PROTECTED]> wrote:
> > > >
> > > >Bjorn Oglefjorn wrote:
> > > >> Anyone? Help?
> > > >> --BO
> > > >>
> > > >> On 4/2/07, Bjorn Oglefjorn < [EMAIL PROTECTED]> wrote:
> > > >>>
> > > >>> Any ideas as to what's going wrong here?
> > > >
> > > >there is so much send/reply/try/fail/fix stuff in the email that I
> > had
> > > >trouble following what was going on.
> > > >
> > > >Could you try reposting this cleanly and explain what symptoms
> > you're
> > > >seeing?  I just saw "it doesn't work", and that's not very helpful.
> >
> > > >
> > > >> See also: http://linux-ha.org/ReportingProblems
> > > >
> > > >
> > > >--
> > > >    Alan Robertson < [EMAIL PROTECTED]>
> > > >
> > > >"Openness is the foundation and preservative of friendship...  Let
> > me
> > > >claim from you at all times your undisguised opinions." - William
> > > >Wilberforce
> > > >_______________________________________________
> > > >Linux-HA mailing list
> > > > [email protected]
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > >See also: http://linux-ha.org/ReportingProblems
> > > >
> >
> > > Mar 30 09:37:56 test-1 heartbeat: [836]: WARN: node test-2.domain:
> > is dead
> > > Mar 30 09:37:56 test-1 heartbeat: [836]: info: Link
> > test-2.domain:eth0 dead.
> > > Mar 30 09:37:56 test-1 crmd: [857]: notice: crmd_ha_status_callback:
> > callbacks.c Status update: Node test-2.domain now has status [dead]
> > > Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event: Got an
> > event OC_EV_MS_NOT_PRIMARY from ccm
> > > Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event:
> > instance=2, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> > > Mar 30 09:37:56 test-1 crmd: [857]: info: crmd_ccm_msg_callback:
> > callbacks.c Quorum lost after event=NOT PRIMARY (id=2)
> > > Mar 30 09:37:56 test-1 cib: [853]: info: activateCibXml:io.c CIB
> > size is 148544 bytes (was 153264)
> > > Mar 30 09:37:56 test-1 cib: [853]: info: 
cib_diff_notify:notify.cLocal-only Change (client:857, call: 24):
> > 0.4.134 (ok)
> > > Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event: Got an
> > event OC_EV_MS_NOT_PRIMARY from ccm
> > > Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event:
> > instance=2, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> > > Mar 30 09:37:56 test-1 cib: [973]: info: write_cib_contents:io.cWrote 
version
> > 0.4.134 of the CIB to disk (digest: 9c22a56909f0417ecc1f3af561f3523a)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an
> > event OC_EV_MS_INVALID from ccm
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an
> > event OC_EV_MS_INVALID from ccm
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: no
> > mbr_track info
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: no
> > mbr_track info
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an
> > event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an
> > event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event:
> > instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event:
> > instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: crmd_ccm_msg_callback:
> > callbacks.c Quorum (re)attained after event=NEW MEMBERSHIP (id=3)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:
> > callbacks.c LOST: test-2.domain
> > > Mar 30 09:38:01 test-1 crmd: [857]: WARN: check_dead_member:ccm.cOur DC 
node (
> > test-2.domain) left the cluster
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:
> > callbacks.c PEER: test-1.domain
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c NEW
> > MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
> > > Mar 30 09:38:01 test-1 ccm: [852]: info: Break tie for 2 nodes
> > cluster
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c      
CURRENT:
> > test-1.domain [nodeid=0, born=3]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail: ccm.c
> >       LOST:    test-2.domain [nodeid=1, born=1]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_NOT_DC -> S_ELECTION [
> > input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC
> > to <null> (<null>)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:
> > election.c Updated voted hash for test-1.domain to vote
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:
> > election.c Election ignore: our vote (test-1.domain)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain : State transition S_ELECTION -> S_INTEGRATION [
> > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:
> > subsystems.c Starting sub-system "tengine"
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:
> > subsystems.c Starting sub-system "pengine"
> > > Mar 30 09:38:01 test-1 tengine: [974]: info:
> > G_main_add_SignalHandler: Added signal handler for signal 15
> > > Mar 30 09:38:01 test-1 tengine: [974]: info:
> > G_main_add_TriggerHandler: Added signal manual handler
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_dc_takeover:election.cTaking 
over DC status for this partition
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC
> > to <null> (<null>)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info:
> > do_dc_join_offer_all:join_dc.c join-1: Waiting on 1 outstanding join acks
> > > Mar 30 09:38:01 test-1 pengine: [975]: info:
> > G_main_add_SignalHandler: Added signal handler for signal 15
> > > Mar 30 09:38:01 test-1 pengine: [975]: info: init_start:main.cStarting 
pengine
> > > Mar 30 09:38:01 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 148648 bytes (was 148544)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: 
cib_diff_notify:notify.cLocal-only Change (client:857, call: 25):
> > 0.4.134 (ok)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_process_readwrite:
> > messages.c We are now in R/W mode
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:28):
> > 0.4.134 -> 0.4.135 (ok)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_null_callback:
> > callbacks.c Setting cib_diff_notify callbacks for tengine: on
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: init_start:main.cRegistering 
TE UUID: 7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: set_graph_functions:
> > utils.c Setting custom graph functions
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: 
unpack_graph:unpack.cUnpacked transition -1: 0 actions in 0 synapses
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: init_start: main.cStarting 
tengine
> > > Mar 30 09:38:01 test-1 cib: [976]: info: write_cib_contents:io.cWrote 
version
> > 0.4.135 of the CIB to disk (digest: 676f407011dd2084813f1c414db97ce5)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc: utils.c Set DC
> > to test-1.domain (1.0.6)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: sync_our_cib:messages.cSyncing 
CIB to all peers
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain : State transition S_INTEGRATION -> S_FINALIZE_JOIN [
> > input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> > > Mar 30 09:38:01 test-1 attrd: [856]: info: attrd_local_callback:
> > attrd.c Sending full refresh
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1 
cluster nodes responded to the join offer.
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_attrd:join_dc.c
> > Connecting to attrd...
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:31):
> > 0.4.135 -> 0.4.136 (ok)
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.4.135 -> 0.4.136
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:32):
> > 0.4.136 -> 0.5.137 (ok)
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_bump): 0.4.136 -> 0.5.137
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:33):
> > 0.5.137 -> 0.5.138 (ok)
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.137 -> 0.5.138
> > > Mar 30 09:38:01 test-1 cib: [977]: info: write_cib_contents:io.cWrote 
version
> > 0.5.138 of the CIB to disk (digest: ff63aedd166b5e7c8880f380b334321e)
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: update_dc:utils.c Set DC
> > to test-1.domain (1.0.6)
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_dc_join_ack:join_dc.c
> > join-1: Updating node state to member for test-1.domain )
> > > Mar 30 09:38:02 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:34):
> > 0.5.138 -> 0.5.139 (ok)
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.138 -> 0.5.139
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [
> > input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort priority upgraded to 1000000
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1 
cluster nodes are eligable to run resources.
> > > Mar 30 09:38:02 test-1 cib: [978]: info: write_cib_contents:io.cWrote 
version
> > 0.5.139 of the CIB to disk (digest: 86e0ea1b0a56ca84cf3d1f0e261f7816)
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:
> > [generation] <cib admin_epoch="0" have_quorum="true" num_peers="2"
> > cib_feature_revision=" 1.3" generated="true" ccm_transition="3"
> > dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="139"/>
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config: 
unpack.cDefault stickiness: 1000
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: 
unpack_config:unpack.cDefault failure stickiness: -400
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: 
unpack_config:unpack.cSTONITH of failed nodes is enabled
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: 
unpack_config:unpack.cSTONITH will reboot nodes
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: 
unpack_config:unpack.cCluster is symmetric - resources can run anywhere by default
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOn 
loss of CCM Quorum: Stop ALL resources
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resources are stopped
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resource actions are stopped
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: 
unpack_config:unpack.cStopped resources are removed from the status section: true
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cBy 
default resources are managed
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > determine_online_status:unpack.c Node test-1.domain is online
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > determine_online_status_fencing:unpack.c Node 
test-2.domain(d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > determine_online_status_fencing: unpack.c         ha_state=dead,
> > ccm_state=false, crm_state=online, join_state=down, expected=member
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > determine_online_status:unpack.c Node test-2.domain is unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: Resource Group:
> > test_group
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > test_IP      (heartbeat::ocf:IPaddr):        Started test-2.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > httpd        (heartbeat::ocf:apache):        Started test-2.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > test-1_DRAC      (stonith:external/drac4):       Started test-2.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > test-2_DRAC      (stonith:external/drac4):       Started test-1.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain          test_IP_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource httpd     (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain          httpd_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test-1_DRAC       (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test-1_DRAC_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: StopRsc: native.c
> > test-1.domain       Stop test-2_DRAC
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: DeleteRsc:native.cRemoving 
test-2_DRAC from
> > test-1.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: stage6: allocate.cScheduling 
Node
> > test-2.domain for STONITH
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > native_stop_constraints:native.c Stop of failed resource test_IP is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > native_stop_constraints: native.c Re-creating actions for test_group
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain          test_IP_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource httpd     (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain          httpd_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > native_stop_constraints:native.c Stop of failed resource httpd is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > native_stop_constraints: native.c Re-creating actions for test_group
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain          test_IP_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource httpd     (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain          httpd_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > native_stop_constraints:native.c Stop of failed resource test-1_DRAC
> > is implict after test-2.domain is fenced
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: stage8: allocate.cCreated 
transition graph 0.
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:
> > pengine.c No value specified for cluster preference:
> > pe-warn-series-max
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:
> > pengine.c Transition 0: WARNINGs found during PE processing. PEngine
> > Input stored in: /var/lib/heartbeat/pengine/pe-warn-45.bz2
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:
> > pengine.c Configuration WARNINGs found during PE processing.  Please
> > run "crm_verify -L" to identify issues.
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE
> > [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: unpack_graph: 
unpack.cUnpacked transition 0: 14 actions in 14 synapses
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 9 confirmed
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 6 confirmed
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 13 confirmed
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 14: test-1_DRAC_start_0 on test-1.domain
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 15: test-2_DRAC_stop_0 on test-1.domain
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op: lrm.cPerforming 
op start on test-1_DRAC (interval=0ms,
> > key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 3 confirmed
> > > Mar 30 09:38:02 test-1 lrmd: [979]: info: Try to start STONITH
> > resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming 
op stop on test-2_DRAC (interval=0ms,
> > key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:03 test-1 lrmd: [981]: info: Try to stop STONITH
> > resource <rsc_id=test-2_DRAC> : Device=external/drac4
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: process_lrm_event:lrm.cLRM 
operation (8) stop_0 on test-2_DRAC complete
> > > Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB
> > size is 150940 bytes (was 148648)
> > > Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:36):
> > 0.5.139 -> 0.5.140 (ok)
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.139 -> 0.5.140
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-2_DRAC_stop_0 (15) confirmed
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 16: test-2_DRAC_delete_0 on test-1.domain
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cRemoving 
resource test-2_DRAC from the LRM
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: send_direct_ack: lrm.cACK'ing 
resource op: delete for test-2_DRAC
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-2_DRAC_delete_0 (16) confirmed
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Executing crm-event (17): lrm_refresh on test-1.domain
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Skipping wait for 17
> > > Mar 30 09:38:03 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to
> > get the value of field lrm_opstatus from a ha_msg
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: msg_to_op: Message
> > follows:
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with
> > 13 fields
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] :
> > [lrm_rid=test-1_DRAC]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] :
> > [lrm_timeout=30000]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> >
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[8] :
> > [lrm_userdata=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[9] :
> > [(2)lrm_param=0x907df78(164 198)]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with
> > 6 fields
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [DRAC_ADDR=
> > test-1.drac.domain]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] :
> > [CRM_meta_op_target_rc=7]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [DRAC_LOGIN=root]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] :
> > [DRAC_PASSWD=********]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] :
> > [CRM_meta_timeout=30000]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [crm_feature_set=
> > 1.0.6]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=7]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=7]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cForcing a 
local LRM refresh
> > > Mar 30 09:38:03 test-1 cib: [985]: info: write_cib_contents: io.cWrote 
version
> > 0.5.140 of the CIB to disk (digest: da245127f5a42d016fecac8050eb99a1)
> > > Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB
> > size is 144888 bytes (was 150940)
> > > Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify: notify.cUpdate 
(client: 857, call:37):
> > 0.5.140 -> 0.5.141 (ok)
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.140 -> 0.5.141
> > > Mar 30 09:38:03 test-1 cib: [986]: info: write_cib_contents: io.cWrote 
version
> > 0.5.141 of the CIB to disk (digest: da6197ce54aa4e2715dd524f8f2a40c5)
> > > Mar 30 09:38:05 test-1 stonithd: [855]: WARN: G_SIG_dispatch:
> > Dispatch function for SIGCHLD took too long to execute: 350 ms (> 10 ms)
> > (GSource: 0x929b820)
> > > Mar 30 09:38:05 test-1 crmd: [857]: info: process_lrm_event:lrm.cLRM 
operation (7) start_0 on test-1_DRAC complete
> > > Mar 30 09:38:05 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:38):
> > 0.5.141 -> 0.5.142 (ok)
> > > Mar 30 09:38:05 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.141 -> 0.5.142
> > > Mar 30 09:38:05 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_start_0 (14) confirmed
> > > Mar 30 09:38:05 test-1 tengine: [974]: info: 
te_fence_node:actions.cExecuting reboot fencing operation (18) on
> > test-2.domain (timeout=15000)
> > > Mar 30 09:38:05 test-1 stonithd: [855]: info: client tengine [pid:
> > 974] want a STONITH operation RESET to node test-2.domain.
> > > Mar 30 09:38:05 test-1 stonithd: [855]: info: Broadcasting the
> > message succeeded: require others to stonith node test-2.domain.
> > > Mar 30 09:38:05 test-1 cib: [988]: info: write_cib_contents:io.cWrote 
version
> > 0.5.142 of the CIB to disk (digest: d5d12bf8f6bdd66647f4dc5cc27adb6e)
> > > Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH the
> > node test-2.domain: optype=1, op_result=2
> > > Mar 30 09:38:20 test-1 tengine: [974]: info:
> > tengine_stonith_callback: callbacks.c call=-2, optype=1, node_name=
> > test-2.domain, result=2, node_list=,
> > action=18;0:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > > Mar 30 09:38:20 test-1 tengine: [974]: ERROR:
> > tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed
> > (2)... aborting transition.
> > > Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort priority upgraded to 1000000
> > > Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort action 0 superceeded by 2
> > > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE
> > [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:20 test-1 tengine: [974]: info: 
run_graph:graph.c====================================================
> > > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1 
cluster nodes are eligable to run resources.
> > > Mar 30 09:38:20 test-1 tengine: [974]: notice: 
run_graph:graph.cTransition 0: (Complete=9, Pending=0, Fired=0, Skipped=5, Incomplete=0)
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:
> > [generation] <cib admin_epoch="0" have_quorum="true" num_peers="2"
> > cib_feature_revision=" 1.3" generated="true" ccm_transition="3"
> > dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="142"/>
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config: 
unpack.cDefault stickiness: 1000
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 
unpack_config:unpack.cDefault failure stickiness: -400
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 
unpack_config:unpack.cSTONITH of failed nodes is enabled
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 
unpack_config:unpack.cSTONITH will reboot nodes
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 
unpack_config:unpack.cCluster is symmetric - resources can run anywhere by default
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOn 
loss of CCM Quorum: Stop ALL resources
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resources are stopped
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan 
resource actions are stopped
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 
unpack_config:unpack.cStopped resources are removed from the status section: true
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cBy 
default resources are managed
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > determine_online_status:unpack.c Node test-1.domain is online
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > determine_online_status_fencing:unpack.c Node 
test-2.domain(d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > determine_online_status_fencing: unpack.c         ha_state=dead,
> > ccm_state=false, crm_state=online, join_state=down, expected=member
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > determine_online_status:unpack.c Node test-2.domain is unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:
> > native.c Resource stonith::external/drac4:test-1_DRAC appears to be
> > active on 2 nodes.
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: See 
http://linux-ha.org/v2/faq/resource_too_active
> > for more information.
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: Resource Group:
> > test_group
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > test_IP      (heartbeat::ocf:IPaddr):        Started test-2.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > httpd        (heartbeat::ocf:apache):        Started test-2.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > test-1_DRAC      (stonith:external/drac4)
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:  0 : test-1.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:  1 : test-2.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > test-2_DRAC      (stonith:external/drac4):       Stopped
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: native_create_probe:
> > native.c test-1.domain: Created probe for test-2_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.cMove  
resource test_IP   (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: 
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain          test_IP_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move  resource httpd     (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: 
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain          httpd_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_create_actions:
> > native.c Attempting recovery of resource test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c
> > test-1.domain       Stop test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: DeleteRsc: 
native.cRemoving test-1_DRAC from
> > test-1.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c
> > test-2.domain       Stop test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cAction 
test-1_DRAC_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cMarking 
node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: StartRsc: native.c
> > test-1.domain       Start test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: stage6:allocate.cScheduling 
Node
> > test-2.domain for STONITH
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > native_stop_constraints: native.c Stop of failed resource test_IP is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > native_stop_constraints:native.c Re-creating actions for test_group
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move  resource test_IP   (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: 
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain          test_IP_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move  resource httpd     (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: 
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain          httpd_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > native_stop_constraints: native.c Stop of failed resource httpd is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > native_stop_constraints:native.c Re-creating actions for test_group
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move  resource test_IP   (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: 
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain          test_IP_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move  resource httpd     (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction 
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: 
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain          httpd_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > native_stop_constraints: native.c Stop of failed resource test-1_DRAC
> > is implict after test-2.domain is fenced
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: stage8:allocate.cCreated 
transition graph 1.
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: process_pe_message:
> > pengine.c No value specified for cluster preference:
> > pe-error-series-max
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: process_pe_message:
> > pengine.c Transition 1: ERRORs found during PE processing. PEngine
> > Input stored in: /var/lib/heartbeat/pengine/pe- error-128.bz2
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:
> > pengine.c Configuration WARNINGs found during PE processing.  Please
> > run "crm_verify -L" to identify issues.
> > > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition: fsa.c
> > test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE
> > [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:20 test-1 tengine: [974]: info: 
unpack_graph:unpack.cUnpacked transition 1: 17 actions in 17 synapses
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 10 confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 7 confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 14: test-1_DRAC_stop_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 17 confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 3: test-2_DRAC_monitor_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming 
op stop on test-1_DRAC (interval=0ms,
> > key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 4 confirmed
> > > Mar 30 09:38:21 test-1 lrmd: [989]: info: Try to stop STONITH
> > resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming 
op monitor on test-2_DRAC (interval=0ms,
> > key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event: lrm.cLRM 
operation (9) stop_0 on test-1_DRAC complete
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event:lrm.cLRM 
operation (10) monitor_0 on test-2_DRAC Error: (7) not running
> > > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 147180 bytes (was 144888)
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:40):
> > 0.5.142 -> 0.5.143 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.142 -> 0.5.143
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_stop_0 (14) confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 15: test-1_DRAC_delete_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cRemoving 
resource test-1_DRAC from the LRM
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: send_direct_ack: lrm.cACK'ing 
resource op: delete for test-1_DRAC
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_delete_0 (15) confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 18: test-1_DRAC_start_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Executing crm-event (16): lrm_refresh on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Skipping wait for 16
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming 
op start on test-1_DRAC (interval=0ms,
> > key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:21 test-1 lrmd: [993]: info: Try to start STONITH
> > resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > > Mar 30 09:38:21 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to
> > get the value of field lrm_opstatus from a ha_msg
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: msg_to_op: Message
> > follows:
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with
> > 13 fields
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] :
> > [lrm_rid=test-1_DRAC]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] :
> > [lrm_timeout=30000]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[8] :
> > [lrm_userdata=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[9] :
> > [(2)lrm_param=0x90afd20(140 168)]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with
> > 5 fields
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [DRAC_ADDR=
> > test-1.drac.domain]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] : [DRAC_LOGIN=root]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] :
> > [DRAC_PASSWD=********]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] :
> > [CRM_meta_timeout=30000]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [crm_feature_set=
> > 1.0.6]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=11]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=11]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cForcing a 
local LRM refresh
> > > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 150940 bytes (was 147180)
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:41):
> > 0.5.143 -> 0.5.144 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.143 -> 0.5.144
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-2_DRAC_monitor_0 (3) confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 2: probe_complete on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 1 confirmed
> > > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 146356 bytes (was 150940)
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:42):
> > 0.5.144 -> 0.5.145 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.144 -> 0.5.145
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:43):
> > 0.5.145 -> 0.5.146 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.145 -> 0.5.146
> > > Mar 30 09:38:21 test-1 cib: [995]: info: write_cib_contents:io.cWrote 
version
> > 0.5.146 of the CIB to disk (digest: 53f6e570e78f4cf92213e89be4ae0573)
> > > Mar 30 09:38:24 test-1 stonithd: [855]: WARN: G_SIG_dispatch:
> > Dispatch function for SIGCHLD took too long to execute: 350 ms (> 10 ms)
> > (GSource: 0x929b820)
> > > Mar 30 09:38:24 test-1 crmd: [857]: info: process_lrm_event: lrm.cLRM 
operation (11) start_0 on test-1_DRAC complete
> > > Mar 30 09:38:24 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate 
(client: 857, call:44):
> > 0.5.146 -> 0.5.147 (ok)
> > > Mar 30 09:38:24 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.146 -> 0.5.147
> > > Mar 30 09:38:24 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_start_0 (18) confirmed
> > > Mar 30 09:38:24 test-1 tengine: [974]: info: te_fence_node:
> > actions.c Executing reboot fencing operation (19) on 
test-2.domain(timeout=15000)
> > > Mar 30 09:38:24 test-1 stonithd: [855]: info: client tengine [pid:
> > 974] want a STONITH operation RESET to node test-2.domain.
> > > Mar 30 09:38:24 test-1 stonithd: [855]: info: Broadcasting the
> > message succeeded: require others to stonith node test-2.domain.
> > > Mar 30 09:38:24 test-1 cib: [999]: info: write_cib_contents:io.cWrote 
version
> > 0.5.147 of the CIB to disk (digest: 7978776db187ec2d2397921646f4053b)
> > > Mar 30 09:38:39 test-1 stonithd: [855]: info: Failed to STONITH the
> > node test-2.domain: optype=1, op_result=2
> > > Mar 30 09:38:39 test-1 tengine: [974]: info:
> > tengine_stonith_callback:callbacks.c call=-3, optype=1, node_name=
> > test-2.domain, result=2, node_list=,
> > action=19;1:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > > Mar 30 09:38:39 test-1 tengine: [974]: ERROR:
> > tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed
> > (2)... aborting transition.
> > > Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort priority upgraded to 1000000
> > > Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort action 0 superceeded by 2
> > > Mar 30 09:38:39 test-1 crmd: [857]: info: do_state_transition: fsa.c
> > test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE
> > [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:39 test-1 tengine: [974]: info: 
run_graph:graph.c====================================================
> > >  <cib admin_epoch="0" epoch="0" num_updates="0">
> > >    <configuration>
> > >      <crm_config>
> > >        <cluster_property_set id="cib-bootstrap-options">
> > >          <attributes>
> > >            <nvpair
> > id="cib-bootstrap-options-transition_idle_timeout"
> > name="transition_idle_timeout" value="5min"/>
> > >            <nvpair id="cib-bootstrap-options-stonith_enabled"
> > name="stonith_enabled" value="true"/>
> > >            <nvpair
> > id="cib-bootstrap-options-default_resource_stickiness"
> > name="default_resource_stickiness" value="1000"/>
> > >            <nvpair id="cib-bootstrap-options-short_resource_names"
> > name="short_resource_names" value="true"/>
> > >            <nvpair
> > id="cib-bootstrap-options-default_resource_failure_stickiness"
> > name="default_resource_failure_stickiness" value="-400"/>
> > >            <nvpair id="cib-bootstrap-options-stonith_action"
> > name="stonith_action" value="reboot"/>
> > >            <nvpair id="cib-bootstrap-options-remove_after_stop"
> > name="remove_after_stop" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-default_action_timeout"
> > name="default_action_timeout" value="30s"/>
> > >          <nvpair id="cib-bootstrap-options-symmetric_cluster"
> > name="symmetric_cluster" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-no_quorum_policy"
> > name="no_quorum_policy" value="stop"/>
> > >            <nvpair id="cib-bootstrap-options-stop_orphan_resources"
> > name="stop_orphan_resources" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-stop_orphan_actions"
> > name="stop_orphan_actions" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-is_managed_default"
> > name="is_managed_default" value="true"/>
> > >          </attributes>
> > >        </cluster_property_set>
> > >      </crm_config>
> > >      <nodes/>
> > >      <resources>
> > >        <group id="test_group">
> > >          <primitive class="ocf" id="test_IP" provider="heartbeat"
> > type="IPaddr">
> > >            <operations>
> > >              <op id="test_IP_mon" interval="10s" name="monitor"
> > timeout="9s" on_fail="restart"/>
> > >              <op id="test_IP_start" name="start" timeout="20s"
> > on_fail="restart" prereq="fencing"/>
> > >              <op id="test_IP_stop" name="stop" timeout="20s"
> > on_fail="restart"/>
> > >            </operations>
> > >            <instance_attributes id="test_IP_inst_attr">
> > >              <attributes>
> > >                <nvpair id="test_IP_attr_0" name="ip" 
value="192.168.168.168
> > "/>
> > >                <nvpair id="test_IP_attr_1" name="netmask"
> > value="25"/>
> > >              </attributes>
> > >            </instance_attributes>
> > >          </primitive>
> > >          <primitive class="ocf" id="httpd" provider="heartbeat"
> > type="apache">
> > >            <operations>
> > >              <op id="httpd_mon" name="monitor" interval="10s"
> > timeout="9s" on_fail="restart"/>
> > >              <op id="httpd_start" name="start" timeout="20s"
> > on_fail="restart" prereq="fencing"/>
> > >              <op id="httpd_stop" name="stop" timeout="20s"
> > on_fail="restart"/>
> > >            </operations>
> > >          </primitive>
> > >        </group>
> > >          <primitive id="test-1_DRAC" class="stonith"
> > type="external/drac4" provider="heartbeat">
> > >            <operations>
> > >              <op id="test-1_DRAC_reset" name="reset" timeout="3m"
> > prereq="nothing"/>
> > >            </operations>
> > >            <instance_attributes id="test-1_DRAC_inst_attr">
> > >              <attributes>
> > >                <nvpair id="test-1_DRAC_attr_0" name="DRAC_ADDR"
> > value=" test-1.drac.domain"/>
> > >                <nvpair id="test-1_DRAC_attr_1" name="DRAC_LOGIN"
> > value="root"/>
> > >                <nvpair id="test-1_DRAC_attr_2" name="DRAC_PASSWD"
> > value="********"/>
> > >              </attributes>
> > >            </instance_attributes>
> > >          </primitive>
> > >          <primitive id="test-2_DRAC" class="stonith"
> > type="external/drac4" provider="heartbeat">
> > >            <operations>
> > >              <op id="test-2_DRAC_reset" name="reset" timeout="3m"
> > prereq="nothing"/>
> > >            </operations>
> > >            <instance_attributes id="test-2_DRAC_inst_attr">
> > >              <attributes>
> > >                <nvpair id="test-2_DRAC_attr_0" name="DRAC_ADDR"
> > value="test-2.drac.domain"/>
> > >                <nvpair id="test-2_DRAC_attr_1" name="DRAC_LOGIN"
> > value="root"/>
> > >                <nvpair id="test-2_DRAC_attr_2" name="DRAC_PASSWD"
> > value="********"/>
> > >              </attributes>
> > >            </instance_attributes>
> > >          </primitive>
> > >      </resources>
> > >      <constraints>
> > >        <rsc_location id="test_group_location_test-1"
> > rsc="test_group">
> > >          <rule id="prefered_location_test_group_test-1"
> > score="1000">
> > >            <expression attribute="#uname"
> > id="prefered_location_test_group_test-1_expr_1" operation="eq" value="
> > test-1.domain"/>
> > >          </rule>
> > >        </rsc_location>
> > >        <rsc_location id="test_group_location_test-2"
> > rsc="test_group">
> > >          <rule id="prefered_location_test_group_test-2"
> > score="1000">
> > >            <expression attribute="#uname"
> > id="prefered_location_test_group_test-2_expr_1" operation="eq" value="
> > test-2.domain"/>
> > >          </rule>
> > >        </rsc_location>
> > >        <rsc_location id="test-1_DRAC_location" rsc="test-1_DRAC">
> > >          <rule id="prefered_location_test-1_DRAC" score="INFINITY">
> > >            <expression attribute="#uname"
> > id="prefered_location_test-1_DRAC_expr_1" operation="eq" value="
> > test-2.domain"/>
> > >          </rule>
> > >        </rsc_location>
> > >        <rsc_location id="test-2_DRAC_location" rsc="test-2_DRAC">
> > >          <rule id="prefered_location_test-2_DRAC" score="INFINITY">
> > >            <expression attribute="#uname"
> > id="prefered_location_test-2_DRAC_expr_1" operation="eq" value="
> > test-1.domain"/>
> > >          </rule>
> > >        </rsc_location>
> > >        <rsc_order id="Ord_websrv" from="httpd" type="after"
> > to="test_IP"/>
> > >        <rsc_colocation id="Colo_not_same_DRAC" from="test-1_DRAC"
> > to="test-2_DRAC" score="-INFINITY"/>
> > >      </constraints>
> > >    </configuration>
> > >    <status/>
> > >  </cib>
> >
> >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> >
> > --
> > Dejan
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

Reply via email to