Any ideas?
--BO
On 4/4/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
>
> I do not know what op_result=2 means. I can only say that the drac4 RA
> will never have exit code 2. I am sure that the drac4 RA works as expected
> in all use cases and also when called via the stonith command from the
> command line.
>
> *Timeouts and Intervals*: I've increased these per your
> recommendations. Please tell me if they are better now.
> Constraints: That definitely makes more sense and I've updated to
> reflect this.
>
> As for the "cleaned up logs", the attached logs are full and have debug
> information. The only "cleaning" performed was to remove any sensitive
> company information. This is mandated and I must do so.
>
> Attached you will find ha.cf, heartbeat logs (with debug) from both
> nodes, the output of cibadmin -Ql at serveral points of execution, the
> output of debug logging from the drac4 RA.
>
> I hope that this will be helpful. Thanks for the continued help.
> --BO
>
> On 4/4/07, Dejan Muhamedagic <[EMAIL PROTECTED] > wrote:
> >
> > On Tue, Apr 03, 2007 at 03:52:37PM -0400, Bjorn Oglefjorn wrote:
> > > Sorry Alan, I realize that this post is getting quite long. Here is
> > a run
> > > down of where I am currently.
> > >
> > > STONITH is failing and I'm still not sure why.
> >
> > Me neither. There's nothing in the logs apart from:
> >
> > Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH
> > the node test-2. domain: optype=1, op_result=2
> >
> > What does op_result=2 mean?
> >
> > I think I've already suggested that you do some debug output in
> > your (you implemented, right?) stonith RA.
> >
> > This is also no good:
> >
> > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:
> > native.c Resource stonith::external/drac4:test-1_DRAC appears to be
> > active on 2 nodes.
> >
> > You sure that drac4 always returns good status?
> >
> > Timeouts: They are really way too short. I think that there was
> > already a discussion on the list about that. Normally, short
> > timeouts will work, but it may, and most probably it will, happen
> > that they don't. Then you'll wonder what the heck went wrong.
> >
> > Intervals: Too often for my taste as I think I already said, but
> > hey, it's your cluster :)
> >
> > Constraints: I'm not sure if you want to allow test-1_DRAC to run
> > on test-1. Setting score to INFINITY won't do if the prefered node
> > is down. It'd be better and more logical to go the other way
> > around: set -INFINITY for the nodes where it shouldn't run.
> >
> > And please go on and read again
> > http://linux-ha.org/ReportingProblems
> > It says, in a Bold type:
> >
> > Please don't send "cleaned up" logs.
> >
> > And there ain't no running CIB either.
> >
> > > I'm quite sure the plugin is
> > > working properly and my cib.xml seems sane enough. I'm having a
> > hard time
> > > digesting the logs and understanding where the trouble is coming
> > from.
> > > Attached again are my cib.xml and the logs from when heartbeat
> > noticed
> > > test-2 was dead all the way up to STONITH failing for the second
> > time...it
> > > loops and tries to STONTIH the dead node indefinitely. It almost
> > seems as
> > > if heartbeat is 'restarting', but like I said, I'm having a hard
> > time
> > > getting useful information from the logs.
> > >
> > > Sorry if I'm being a pest, but I really don't like not figuring
> > stuff out.
> > > Let me know if you need more info.
> > > --BO
> > >
> > > On 4/3/07, Alan Robertson < [EMAIL PROTECTED]> wrote:
> > > >
> > > >Bjorn Oglefjorn wrote:
> > > >> Anyone? Help?
> > > >> --BO
> > > >>
> > > >> On 4/2/07, Bjorn Oglefjorn < [EMAIL PROTECTED]> wrote:
> > > >>>
> > > >>> Any ideas as to what's going wrong here?
> > > >
> > > >there is so much send/reply/try/fail/fix stuff in the email that I
> > had
> > > >trouble following what was going on.
> > > >
> > > >Could you try reposting this cleanly and explain what symptoms
> > you're
> > > >seeing? I just saw "it doesn't work", and that's not very helpful.
> >
> > > >
> > > >> See also: http://linux-ha.org/ReportingProblems
> > > >
> > > >
> > > >--
> > > > Alan Robertson < [EMAIL PROTECTED]>
> > > >
> > > >"Openness is the foundation and preservative of friendship... Let
> > me
> > > >claim from you at all times your undisguised opinions." - William
> > > >Wilberforce
> > > >_______________________________________________
> > > >Linux-HA mailing list
> > > > [email protected]
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > >See also: http://linux-ha.org/ReportingProblems
> > > >
> >
> > > Mar 30 09:37:56 test-1 heartbeat: [836]: WARN: node test-2.domain:
> > is dead
> > > Mar 30 09:37:56 test-1 heartbeat: [836]: info: Link
> > test-2.domain:eth0 dead.
> > > Mar 30 09:37:56 test-1 crmd: [857]: notice: crmd_ha_status_callback:
> > callbacks.c Status update: Node test-2.domain now has status [dead]
> > > Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event: Got an
> > event OC_EV_MS_NOT_PRIMARY from ccm
> > > Mar 30 09:37:56 test-1 crmd: [857]: info: mem_handle_event:
> > instance=2, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> > > Mar 30 09:37:56 test-1 crmd: [857]: info: crmd_ccm_msg_callback:
> > callbacks.c Quorum lost after event=NOT PRIMARY (id=2)
> > > Mar 30 09:37:56 test-1 cib: [853]: info: activateCibXml:io.c CIB
> > size is 148544 bytes (was 153264)
> > > Mar 30 09:37:56 test-1 cib: [853]: info:
cib_diff_notify:notify.cLocal-only Change (client:857, call: 24):
> > 0.4.134 (ok)
> > > Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event: Got an
> > event OC_EV_MS_NOT_PRIMARY from ccm
> > > Mar 30 09:37:56 test-1 cib: [853]: info: mem_handle_event:
> > instance=2, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
> > > Mar 30 09:37:56 test-1 cib: [973]: info: write_cib_contents:io.cWrote
version
> > 0.4.134 of the CIB to disk (digest: 9c22a56909f0417ecc1f3af561f3523a)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an
> > event OC_EV_MS_INVALID from ccm
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an
> > event OC_EV_MS_INVALID from ccm
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: no
> > mbr_track info
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: no
> > mbr_track info
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event: Got an
> > event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event: Got an
> > event OC_EV_MS_NEW_MEMBERSHIP from ccm
> > > Mar 30 09:38:01 test-1 cib: [853]: info: mem_handle_event:
> > instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: mem_handle_event:
> > instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: crmd_ccm_msg_callback:
> > callbacks.c Quorum (re)attained after event=NEW MEMBERSHIP (id=3)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:
> > callbacks.c LOST: test-2.domain
> > > Mar 30 09:38:01 test-1 crmd: [857]: WARN: check_dead_member:ccm.cOur DC
node (
> > test-2.domain) left the cluster
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_ccm_msg_callback:
> > callbacks.c PEER: test-1.domain
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c NEW
> > MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
> > > Mar 30 09:38:01 test-1 ccm: [852]: info: Break tie for 2 nodes
> > cluster
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail:ccm.c
CURRENT:
> > test-1.domain [nodeid=0, born=3]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: ccm_event_detail: ccm.c
> > LOST: test-2.domain [nodeid=1, born=1]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_NOT_DC -> S_ELECTION [
> > input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC
> > to <null> (<null>)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:
> > election.c Updated voted hash for test-1.domain to vote
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_election_count_vote:
> > election.c Election ignore: our vote (test-1.domain)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain : State transition S_ELECTION -> S_INTEGRATION [
> > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:
> > subsystems.c Starting sub-system "tengine"
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: start_subsystem:
> > subsystems.c Starting sub-system "pengine"
> > > Mar 30 09:38:01 test-1 tengine: [974]: info:
> > G_main_add_SignalHandler: Added signal handler for signal 15
> > > Mar 30 09:38:01 test-1 tengine: [974]: info:
> > G_main_add_TriggerHandler: Added signal manual handler
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_dc_takeover:election.cTaking
over DC status for this partition
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc:utils.c Set DC
> > to <null> (<null>)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info:
> > do_dc_join_offer_all:join_dc.c join-1: Waiting on 1 outstanding join acks
> > > Mar 30 09:38:01 test-1 pengine: [975]: info:
> > G_main_add_SignalHandler: Added signal handler for signal 15
> > > Mar 30 09:38:01 test-1 pengine: [975]: info: init_start:main.cStarting
pengine
> > > Mar 30 09:38:01 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 148648 bytes (was 148544)
> > > Mar 30 09:38:01 test-1 cib: [853]: info:
cib_diff_notify:notify.cLocal-only Change (client:857, call: 25):
> > 0.4.134 (ok)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_process_readwrite:
> > messages.c We are now in R/W mode
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:28):
> > 0.4.134 -> 0.4.135 (ok)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_null_callback:
> > callbacks.c Setting cib_diff_notify callbacks for tengine: on
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: init_start:main.cRegistering
TE UUID: 7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: set_graph_functions:
> > utils.c Setting custom graph functions
> > > Mar 30 09:38:01 test-1 tengine: [974]: info:
unpack_graph:unpack.cUnpacked transition -1: 0 actions in 0 synapses
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: init_start: main.cStarting
tengine
> > > Mar 30 09:38:01 test-1 cib: [976]: info: write_cib_contents:io.cWrote
version
> > 0.4.135 of the CIB to disk (digest: 676f407011dd2084813f1c414db97ce5)
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_dc: utils.c Set DC
> > to test-1.domain (1.0.6)
> > > Mar 30 09:38:01 test-1 cib: [853]: info: sync_our_cib:messages.cSyncing
CIB to all peers
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain : State transition S_INTEGRATION -> S_FINALIZE_JOIN [
> > input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> > > Mar 30 09:38:01 test-1 attrd: [856]: info: attrd_local_callback:
> > attrd.c Sending full refresh
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1
cluster nodes responded to the join offer.
> > > Mar 30 09:38:01 test-1 crmd: [857]: info: update_attrd:join_dc.c
> > Connecting to attrd...
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:31):
> > 0.4.135 -> 0.4.136 (ok)
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.4.135 -> 0.4.136
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:32):
> > 0.4.136 -> 0.5.137 (ok)
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_bump): 0.4.136 -> 0.5.137
> > > Mar 30 09:38:01 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:33):
> > 0.5.137 -> 0.5.138 (ok)
> > > Mar 30 09:38:01 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.137 -> 0.5.138
> > > Mar 30 09:38:01 test-1 cib: [977]: info: write_cib_contents:io.cWrote
version
> > 0.5.138 of the CIB to disk (digest: ff63aedd166b5e7c8880f380b334321e)
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: update_dc:utils.c Set DC
> > to test-1.domain (1.0.6)
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_dc_join_ack:join_dc.c
> > join-1: Updating node state to member for test-1.domain )
> > > Mar 30 09:38:02 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:34):
> > 0.5.138 -> 0.5.139 (ok)
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.138 -> 0.5.139
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [
> > input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort priority upgraded to 1000000
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1
cluster nodes are eligable to run resources.
> > > Mar 30 09:38:02 test-1 cib: [978]: info: write_cib_contents:io.cWrote
version
> > 0.5.139 of the CIB to disk (digest: 86e0ea1b0a56ca84cf3d1f0e261f7816)
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:
> > [generation] <cib admin_epoch="0" have_quorum="true" num_peers="2"
> > cib_feature_revision=" 1.3" generated="true" ccm_transition="3"
> > dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="139"/>
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:
unpack.cDefault stickiness: 1000
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
unpack_config:unpack.cDefault failure stickiness: -400
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
unpack_config:unpack.cSTONITH of failed nodes is enabled
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
unpack_config:unpack.cSTONITH will reboot nodes
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
unpack_config:unpack.cCluster is symmetric - resources can run anywhere by default
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOn
loss of CCM Quorum: Stop ALL resources
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan
resources are stopped
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan
resource actions are stopped
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
unpack_config:unpack.cStopped resources are removed from the status section: true
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: unpack_config:unpack.cBy
default resources are managed
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > determine_online_status:unpack.c Node test-1.domain is online
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > determine_online_status_fencing:unpack.c Node
test-2.domain(d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > determine_online_status_fencing: unpack.c ha_state=dead,
> > ccm_state=false, crm_state=online, join_state=down, expected=member
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > determine_online_status:unpack.c Node test-2.domain is unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: Resource Group:
> > test_group
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > test_IP (heartbeat::ocf:IPaddr): Started test-2.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > httpd (heartbeat::ocf:apache): Started test-2.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > test-1_DRAC (stonith:external/drac4): Started test-2.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > test-2_DRAC (stonith:external/drac4): Started test-1.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource test_IP (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cAction
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain test_IP_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource httpd (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain httpd_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource test-1_DRAC (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction
test-1_DRAC_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: StopRsc: native.c
> > test-1.domain Stop test-2_DRAC
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: DeleteRsc:native.cRemoving
test-2_DRAC from
> > test-1.domain
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: stage6: allocate.cScheduling
Node
> > test-2.domain for STONITH
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > native_stop_constraints:native.c Stop of failed resource test_IP is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > native_stop_constraints: native.c Re-creating actions for test_group
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource test_IP (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain test_IP_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource httpd (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain httpd_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > native_stop_constraints:native.c Stop of failed resource httpd is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:02 test-1 pengine: [975]: info:
> > native_stop_constraints: native.c Re-creating actions for test_group
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource test_IP (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain test_IP_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource httpd (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action: utils.cAction
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: Recurring: native.c
> > test-1.domain httpd_monitor_10000
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN:
> > native_stop_constraints:native.c Stop of failed resource test-1_DRAC
> > is implict after test-2.domain is fenced
> > > Mar 30 09:38:02 test-1 pengine: [975]: notice: stage8: allocate.cCreated
transition graph 0.
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:
> > pengine.c No value specified for cluster preference:
> > pe-warn-series-max
> > > Mar 30 09:38:02 test-1 pengine: [975]: WARN: process_pe_message:
> > pengine.c Transition 0: WARNINGs found during PE processing. PEngine
> > Input stored in: /var/lib/heartbeat/pengine/pe-warn-45.bz2
> > > Mar 30 09:38:02 test-1 pengine: [975]: info: process_pe_message:
> > pengine.c Configuration WARNINGs found during PE processing. Please
> > run "crm_verify -L" to identify issues.
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE
> > [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: unpack_graph:
unpack.cUnpacked transition 0: 14 actions in 14 synapses
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 9 confirmed
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 6 confirmed
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 13 confirmed
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 14: test-1_DRAC_start_0 on test-1.domain
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 15: test-2_DRAC_stop_0 on test-1.domain
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op: lrm.cPerforming
op start on test-1_DRAC (interval=0ms,
> > key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:02 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 3 confirmed
> > > Mar 30 09:38:02 test-1 lrmd: [979]: info: Try to start STONITH
> > resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > > Mar 30 09:38:02 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming
op stop on test-2_DRAC (interval=0ms,
> > key=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:03 test-1 lrmd: [981]: info: Try to stop STONITH
> > resource <rsc_id=test-2_DRAC> : Device=external/drac4
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: process_lrm_event:lrm.cLRM
operation (8) stop_0 on test-2_DRAC complete
> > > Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB
> > size is 150940 bytes (was 148648)
> > > Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:36):
> > 0.5.139 -> 0.5.140 (ok)
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.139 -> 0.5.140
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-2_DRAC_stop_0 (15) confirmed
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 16: test-2_DRAC_delete_0 on test-1.domain
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cRemoving
resource test-2_DRAC from the LRM
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: send_direct_ack: lrm.cACK'ing
resource op: delete for test-2_DRAC
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-2_DRAC_delete_0 (16) confirmed
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Executing crm-event (17): lrm_refresh on test-1.domain
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Skipping wait for 17
> > > Mar 30 09:38:03 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to
> > get the value of field lrm_opstatus from a ha_msg
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: msg_to_op: Message
> > follows:
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with
> > 13 fields
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] :
> > [lrm_rid=test-1_DRAC]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] :
> > [lrm_timeout=30000]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> >
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[8] :
> > [lrm_userdata=0:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[9] :
> > [(2)lrm_param=0x907df78(164 198)]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG: Dumping message with
> > 6 fields
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[0] : [DRAC_ADDR=
> > test-1.drac.domain]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[1] :
> > [CRM_meta_op_target_rc=7]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[2] : [DRAC_LOGIN=root]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[3] :
> > [DRAC_PASSWD=********]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[4] :
> > [CRM_meta_timeout=30000]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[5] : [crm_feature_set=
> > 1.0.6]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=7]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=7]
> > > Mar 30 09:38:03 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cForcing a
local LRM refresh
> > > Mar 30 09:38:03 test-1 cib: [985]: info: write_cib_contents: io.cWrote
version
> > 0.5.140 of the CIB to disk (digest: da245127f5a42d016fecac8050eb99a1)
> > > Mar 30 09:38:03 test-1 cib: [853]: info: activateCibXml:io.c CIB
> > size is 144888 bytes (was 150940)
> > > Mar 30 09:38:03 test-1 cib: [853]: info: cib_diff_notify: notify.cUpdate
(client: 857, call:37):
> > 0.5.140 -> 0.5.141 (ok)
> > > Mar 30 09:38:03 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.140 -> 0.5.141
> > > Mar 30 09:38:03 test-1 cib: [986]: info: write_cib_contents: io.cWrote
version
> > 0.5.141 of the CIB to disk (digest: da6197ce54aa4e2715dd524f8f2a40c5)
> > > Mar 30 09:38:05 test-1 stonithd: [855]: WARN: G_SIG_dispatch:
> > Dispatch function for SIGCHLD took too long to execute: 350 ms (> 10 ms)
> > (GSource: 0x929b820)
> > > Mar 30 09:38:05 test-1 crmd: [857]: info: process_lrm_event:lrm.cLRM
operation (7) start_0 on test-1_DRAC complete
> > > Mar 30 09:38:05 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:38):
> > 0.5.141 -> 0.5.142 (ok)
> > > Mar 30 09:38:05 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.141 -> 0.5.142
> > > Mar 30 09:38:05 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_start_0 (14) confirmed
> > > Mar 30 09:38:05 test-1 tengine: [974]: info:
te_fence_node:actions.cExecuting reboot fencing operation (18) on
> > test-2.domain (timeout=15000)
> > > Mar 30 09:38:05 test-1 stonithd: [855]: info: client tengine [pid:
> > 974] want a STONITH operation RESET to node test-2.domain.
> > > Mar 30 09:38:05 test-1 stonithd: [855]: info: Broadcasting the
> > message succeeded: require others to stonith node test-2.domain.
> > > Mar 30 09:38:05 test-1 cib: [988]: info: write_cib_contents:io.cWrote
version
> > 0.5.142 of the CIB to disk (digest: d5d12bf8f6bdd66647f4dc5cc27adb6e)
> > > Mar 30 09:38:20 test-1 stonithd: [855]: info: Failed to STONITH the
> > node test-2.domain: optype=1, op_result=2
> > > Mar 30 09:38:20 test-1 tengine: [974]: info:
> > tengine_stonith_callback: callbacks.c call=-2, optype=1, node_name=
> > test-2.domain, result=2, node_list=,
> > action=18;0:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > > Mar 30 09:38:20 test-1 tengine: [974]: ERROR:
> > tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed
> > (2)... aborting transition.
> > > Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort priority upgraded to 1000000
> > > Mar 30 09:38:20 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort action 0 superceeded by 2
> > > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.c
> > test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE
> > [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:20 test-1 tengine: [974]: info:
run_graph:graph.c====================================================
> > > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition:fsa.cAll 1
cluster nodes are eligable to run resources.
> > > Mar 30 09:38:20 test-1 tengine: [974]: notice:
run_graph:graph.cTransition 0: (Complete=9, Pending=0, Fired=0, Skipped=5, Incomplete=0)
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:
> > [generation] <cib admin_epoch="0" have_quorum="true" num_peers="2"
> > cib_feature_revision=" 1.3" generated="true" ccm_transition="3"
> > dc_uuid="b3bba1ca-b072-49ac-8e93-a2c6fbf4678e" epoch="5" num_updates="142"/>
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:
unpack.cDefault stickiness: 1000
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
unpack_config:unpack.cDefault failure stickiness: -400
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
unpack_config:unpack.cSTONITH of failed nodes is enabled
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
unpack_config:unpack.cSTONITH will reboot nodes
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
unpack_config:unpack.cCluster is symmetric - resources can run anywhere by default
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOn
loss of CCM Quorum: Stop ALL resources
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan
resources are stopped
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cOrphan
resource actions are stopped
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
unpack_config:unpack.cStopped resources are removed from the status section: true
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: unpack_config:unpack.cBy
default resources are managed
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > determine_online_status:unpack.c Node test-1.domain is online
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > determine_online_status_fencing:unpack.c Node
test-2.domain(d822d67b-5495-47c9-bdb9-f4f66e0bea85) is un-expectedly down
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > determine_online_status_fencing: unpack.c ha_state=dead,
> > ccm_state=false, crm_state=online, join_state=down, expected=member
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > determine_online_status:unpack.c Node test-2.domain is unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_add_running:
> > native.c Resource stonith::external/drac4:test-1_DRAC appears to be
> > active on 2 nodes.
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: See
http://linux-ha.org/v2/faq/resource_too_active
> > for more information.
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: Resource Group:
> > test_group
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > test_IP (heartbeat::ocf:IPaddr): Started test-2.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > httpd (heartbeat::ocf:apache): Started test-2.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > test-1_DRAC (stonith:external/drac4)
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 0 : test-1.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: 1 : test-2.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > test-2_DRAC (stonith:external/drac4): Stopped
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: native_create_probe:
> > native.c test-1.domain: Created probe for test-2_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:native.cMove
resource test_IP (
> > test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain test_IP_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move resource httpd (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain httpd_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: native_create_actions:
> > native.c Attempting recovery of resource test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c
> > test-1.domain Stop test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: DeleteRsc:
native.cRemoving test-1_DRAC from
> > test-1.domain
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: StopRsc:native.c
> > test-2.domain Stop test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action: utils.cAction
test-1_DRAC_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cMarking
node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: StartRsc: native.c
> > test-1.domain Start test-1_DRAC
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: stage6:allocate.cScheduling
Node
> > test-2.domain for STONITH
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > native_stop_constraints: native.c Stop of failed resource test_IP is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > native_stop_constraints:native.c Re-creating actions for test_group
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move resource test_IP (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain test_IP_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move resource httpd (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain httpd_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > native_stop_constraints: native.c Stop of failed resource httpd is
> > implict after test-2.domain is fenced
> > > Mar 30 09:38:20 test-1 pengine: [975]: info:
> > native_stop_constraints:native.c Re-creating actions for test_group
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move resource test_IP (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction
test_IP_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain test_IP_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: NoRoleChange:
> > native.c Move resource httpd (test-2.domain -> test-1.domain)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:utils.cAction
httpd_stop_0 on
> > test-2.domain is unrunnable (offline)
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: custom_action:
utils.cMarking node
> > test-2.domain unclean
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: Recurring:native.c
> > test-1.domain httpd_monitor_10000
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN:
> > native_stop_constraints: native.c Stop of failed resource test-1_DRAC
> > is implict after test-2.domain is fenced
> > > Mar 30 09:38:20 test-1 pengine: [975]: notice: stage8:allocate.cCreated
transition graph 1.
> > > Mar 30 09:38:20 test-1 pengine: [975]: WARN: process_pe_message:
> > pengine.c No value specified for cluster preference:
> > pe-error-series-max
> > > Mar 30 09:38:20 test-1 pengine: [975]: ERROR: process_pe_message:
> > pengine.c Transition 1: ERRORs found during PE processing. PEngine
> > Input stored in: /var/lib/heartbeat/pengine/pe- error-128.bz2
> > > Mar 30 09:38:20 test-1 pengine: [975]: info: process_pe_message:
> > pengine.c Configuration WARNINGs found during PE processing. Please
> > run "crm_verify -L" to identify issues.
> > > Mar 30 09:38:20 test-1 crmd: [857]: info: do_state_transition: fsa.c
> > test-1.domain: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE
> > [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:20 test-1 tengine: [974]: info:
unpack_graph:unpack.cUnpacked transition 1: 17 actions in 17 synapses
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 10 confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 7 confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 14: test-1_DRAC_stop_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 17 confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 3: test-2_DRAC_monitor_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming
op stop on test-1_DRAC (interval=0ms,
> > key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 4 confirmed
> > > Mar 30 09:38:21 test-1 lrmd: [989]: info: Try to stop STONITH
> > resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming
op monitor on test-2_DRAC (interval=0ms,
> > key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event: lrm.cLRM
operation (9) stop_0 on test-1_DRAC complete
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: process_lrm_event:lrm.cLRM
operation (10) monitor_0 on test-2_DRAC Error: (7) not running
> > > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 147180 bytes (was 144888)
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:40):
> > 0.5.142 -> 0.5.143 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.142 -> 0.5.143
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_stop_0 (14) confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 15: test-1_DRAC_delete_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cRemoving
resource test-1_DRAC from the LRM
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: send_direct_ack: lrm.cACK'ing
resource op: delete for test-1_DRAC
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_delete_0 (15) confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 18: test-1_DRAC_start_0 on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Executing crm-event (16): lrm_refresh on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_crm_command:
> > actions.c Skipping wait for 16
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_rsc_op:lrm.cPerforming
op start on test-1_DRAC (interval=0ms,
> > key=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f)
> > > Mar 30 09:38:21 test-1 lrmd: [993]: info: Try to start STONITH
> > resource <rsc_id=test-1_DRAC> : Device=external/drac4
> > > Mar 30 09:38:21 test-1 crmd: [857]: WARN: msg_to_op(1151): failed to
> > get the value of field lrm_opstatus from a ha_msg
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: msg_to_op: Message
> > follows:
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with
> > 13 fields
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [lrm_t=op]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] :
> > [lrm_rid=test-1_DRAC]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] : [lrm_op=start]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] :
> > [lrm_timeout=30000]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [lrm_interval=0]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[5] : [lrm_delay=0]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[6] : [lrm_targetrc=-1]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[7] : [lrm_app=crmd]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[8] :
> > [lrm_userdata=1:7ec7d2e0-ae10-4810-a67a-73119ab6855f]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[9] :
> > [(2)lrm_param=0x90afd20(140 168)]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG: Dumping message with
> > 5 fields
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[0] : [DRAC_ADDR=
> > test-1.drac.domain]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[1] : [DRAC_LOGIN=root]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[2] :
> > [DRAC_PASSWD=********]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[3] :
> > [CRM_meta_timeout=30000]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[4] : [crm_feature_set=
> > 1.0.6]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[10] : [lrm_callid=11]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[11] : [lrm_app=crmd]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: MSG[12] : [lrm_callid=11]
> > > Mar 30 09:38:21 test-1 crmd: [857]: info: do_lrm_invoke:lrm.cForcing a
local LRM refresh
> > > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 150940 bytes (was 147180)
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:41):
> > 0.5.143 -> 0.5.144 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.143 -> 0.5.144
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-2_DRAC_monitor_0 (3) confirmed
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: send_rsc_command:
> > actions.c Initiating action 2: probe_complete on test-1.domain
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_pseudo_action:
> > actions.c Pseudo action 1 confirmed
> > > Mar 30 09:38:21 test-1 cib: [853]: info: activateCibXml: io.c CIB
> > size is 146356 bytes (was 150940)
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:42):
> > 0.5.144 -> 0.5.145 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.144 -> 0.5.145
> > > Mar 30 09:38:21 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:43):
> > 0.5.145 -> 0.5.146 (ok)
> > > Mar 30 09:38:21 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.145 -> 0.5.146
> > > Mar 30 09:38:21 test-1 cib: [995]: info: write_cib_contents:io.cWrote
version
> > 0.5.146 of the CIB to disk (digest: 53f6e570e78f4cf92213e89be4ae0573)
> > > Mar 30 09:38:24 test-1 stonithd: [855]: WARN: G_SIG_dispatch:
> > Dispatch function for SIGCHLD took too long to execute: 350 ms (> 10 ms)
> > (GSource: 0x929b820)
> > > Mar 30 09:38:24 test-1 crmd: [857]: info: process_lrm_event: lrm.cLRM
operation (11) start_0 on test-1_DRAC complete
> > > Mar 30 09:38:24 test-1 cib: [853]: info: cib_diff_notify:notify.cUpdate
(client: 857, call:44):
> > 0.5.146 -> 0.5.147 (ok)
> > > Mar 30 09:38:24 test-1 tengine: [974]: info: te_update_diff:
> > callbacks.c Processing diff (cib_update): 0.5.146 -> 0.5.147
> > > Mar 30 09:38:24 test-1 tengine: [974]: info: match_graph_event:
> > events.c Action test-1_DRAC_start_0 (18) confirmed
> > > Mar 30 09:38:24 test-1 tengine: [974]: info: te_fence_node:
> > actions.c Executing reboot fencing operation (19) on
test-2.domain(timeout=15000)
> > > Mar 30 09:38:24 test-1 stonithd: [855]: info: client tengine [pid:
> > 974] want a STONITH operation RESET to node test-2.domain.
> > > Mar 30 09:38:24 test-1 stonithd: [855]: info: Broadcasting the
> > message succeeded: require others to stonith node test-2.domain.
> > > Mar 30 09:38:24 test-1 cib: [999]: info: write_cib_contents:io.cWrote
version
> > 0.5.147 of the CIB to disk (digest: 7978776db187ec2d2397921646f4053b)
> > > Mar 30 09:38:39 test-1 stonithd: [855]: info: Failed to STONITH the
> > node test-2.domain: optype=1, op_result=2
> > > Mar 30 09:38:39 test-1 tengine: [974]: info:
> > tengine_stonith_callback:callbacks.c call=-3, optype=1, node_name=
> > test-2.domain, result=2, node_list=,
> > action=19;1:7ec7d2e0-ae10-4810-a67a-73119ab6855f
> > > Mar 30 09:38:39 test-1 tengine: [974]: ERROR:
> > tengine_stonith_callback:callbacks.c Stonith of test-2.domain failed
> > (2)... aborting transition.
> > > Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort priority upgraded to 1000000
> > > Mar 30 09:38:39 test-1 tengine: [974]: info: update_abort_priority:
> > utils.c Abort action 0 superceeded by 2
> > > Mar 30 09:38:39 test-1 crmd: [857]: info: do_state_transition: fsa.c
> > test-1.domain: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE
> > [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > > Mar 30 09:38:39 test-1 tengine: [974]: info:
run_graph:graph.c====================================================
> > > <cib admin_epoch="0" epoch="0" num_updates="0">
> > > <configuration>
> > > <crm_config>
> > > <cluster_property_set id="cib-bootstrap-options">
> > > <attributes>
> > > <nvpair
> > id="cib-bootstrap-options-transition_idle_timeout"
> > name="transition_idle_timeout" value="5min"/>
> > > <nvpair id="cib-bootstrap-options-stonith_enabled"
> > name="stonith_enabled" value="true"/>
> > > <nvpair
> > id="cib-bootstrap-options-default_resource_stickiness"
> > name="default_resource_stickiness" value="1000"/>
> > > <nvpair id="cib-bootstrap-options-short_resource_names"
> > name="short_resource_names" value="true"/>
> > > <nvpair
> > id="cib-bootstrap-options-default_resource_failure_stickiness"
> > name="default_resource_failure_stickiness" value="-400"/>
> > > <nvpair id="cib-bootstrap-options-stonith_action"
> > name="stonith_action" value="reboot"/>
> > > <nvpair id="cib-bootstrap-options-remove_after_stop"
> > name="remove_after_stop" value="true"/>
> > > <nvpair id="cib-bootstrap-options-default_action_timeout"
> > name="default_action_timeout" value="30s"/>
> > > <nvpair id="cib-bootstrap-options-symmetric_cluster"
> > name="symmetric_cluster" value="true"/>
> > > <nvpair id="cib-bootstrap-options-no_quorum_policy"
> > name="no_quorum_policy" value="stop"/>
> > > <nvpair id="cib-bootstrap-options-stop_orphan_resources"
> > name="stop_orphan_resources" value="true"/>
> > > <nvpair id="cib-bootstrap-options-stop_orphan_actions"
> > name="stop_orphan_actions" value="true"/>
> > > <nvpair id="cib-bootstrap-options-is_managed_default"
> > name="is_managed_default" value="true"/>
> > > </attributes>
> > > </cluster_property_set>
> > > </crm_config>
> > > <nodes/>
> > > <resources>
> > > <group id="test_group">
> > > <primitive class="ocf" id="test_IP" provider="heartbeat"
> > type="IPaddr">
> > > <operations>
> > > <op id="test_IP_mon" interval="10s" name="monitor"
> > timeout="9s" on_fail="restart"/>
> > > <op id="test_IP_start" name="start" timeout="20s"
> > on_fail="restart" prereq="fencing"/>
> > > <op id="test_IP_stop" name="stop" timeout="20s"
> > on_fail="restart"/>
> > > </operations>
> > > <instance_attributes id="test_IP_inst_attr">
> > > <attributes>
> > > <nvpair id="test_IP_attr_0" name="ip"
value="192.168.168.168
> > "/>
> > > <nvpair id="test_IP_attr_1" name="netmask"
> > value="25"/>
> > > </attributes>
> > > </instance_attributes>
> > > </primitive>
> > > <primitive class="ocf" id="httpd" provider="heartbeat"
> > type="apache">
> > > <operations>
> > > <op id="httpd_mon" name="monitor" interval="10s"
> > timeout="9s" on_fail="restart"/>
> > > <op id="httpd_start" name="start" timeout="20s"
> > on_fail="restart" prereq="fencing"/>
> > > <op id="httpd_stop" name="stop" timeout="20s"
> > on_fail="restart"/>
> > > </operations>
> > > </primitive>
> > > </group>
> > > <primitive id="test-1_DRAC" class="stonith"
> > type="external/drac4" provider="heartbeat">
> > > <operations>
> > > <op id="test-1_DRAC_reset" name="reset" timeout="3m"
> > prereq="nothing"/>
> > > </operations>
> > > <instance_attributes id="test-1_DRAC_inst_attr">
> > > <attributes>
> > > <nvpair id="test-1_DRAC_attr_0" name="DRAC_ADDR"
> > value=" test-1.drac.domain"/>
> > > <nvpair id="test-1_DRAC_attr_1" name="DRAC_LOGIN"
> > value="root"/>
> > > <nvpair id="test-1_DRAC_attr_2" name="DRAC_PASSWD"
> > value="********"/>
> > > </attributes>
> > > </instance_attributes>
> > > </primitive>
> > > <primitive id="test-2_DRAC" class="stonith"
> > type="external/drac4" provider="heartbeat">
> > > <operations>
> > > <op id="test-2_DRAC_reset" name="reset" timeout="3m"
> > prereq="nothing"/>
> > > </operations>
> > > <instance_attributes id="test-2_DRAC_inst_attr">
> > > <attributes>
> > > <nvpair id="test-2_DRAC_attr_0" name="DRAC_ADDR"
> > value="test-2.drac.domain"/>
> > > <nvpair id="test-2_DRAC_attr_1" name="DRAC_LOGIN"
> > value="root"/>
> > > <nvpair id="test-2_DRAC_attr_2" name="DRAC_PASSWD"
> > value="********"/>
> > > </attributes>
> > > </instance_attributes>
> > > </primitive>
> > > </resources>
> > > <constraints>
> > > <rsc_location id="test_group_location_test-1"
> > rsc="test_group">
> > > <rule id="prefered_location_test_group_test-1"
> > score="1000">
> > > <expression attribute="#uname"
> > id="prefered_location_test_group_test-1_expr_1" operation="eq" value="
> > test-1.domain"/>
> > > </rule>
> > > </rsc_location>
> > > <rsc_location id="test_group_location_test-2"
> > rsc="test_group">
> > > <rule id="prefered_location_test_group_test-2"
> > score="1000">
> > > <expression attribute="#uname"
> > id="prefered_location_test_group_test-2_expr_1" operation="eq" value="
> > test-2.domain"/>
> > > </rule>
> > > </rsc_location>
> > > <rsc_location id="test-1_DRAC_location" rsc="test-1_DRAC">
> > > <rule id="prefered_location_test-1_DRAC" score="INFINITY">
> > > <expression attribute="#uname"
> > id="prefered_location_test-1_DRAC_expr_1" operation="eq" value="
> > test-2.domain"/>
> > > </rule>
> > > </rsc_location>
> > > <rsc_location id="test-2_DRAC_location" rsc="test-2_DRAC">
> > > <rule id="prefered_location_test-2_DRAC" score="INFINITY">
> > > <expression attribute="#uname"
> > id="prefered_location_test-2_DRAC_expr_1" operation="eq" value="
> > test-1.domain"/>
> > > </rule>
> > > </rsc_location>
> > > <rsc_order id="Ord_websrv" from="httpd" type="after"
> > to="test_IP"/>
> > > <rsc_colocation id="Colo_not_same_DRAC" from="test-1_DRAC"
> > to="test-2_DRAC" score="-INFINITY"/>
> > > </constraints>
> > > </configuration>
> > > <status/>
> > > </cib>
> >
> >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> >
> > --
> > Dejan
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
>