Hi all,
we are using external/ipmi on a drbd8 / heartbeat-2.0.7-2 as stonith
device. While the stonith itself works well, there are some errors which
recur and eventually lead to unwanted reboots. And mayby there are no
real causes but just my lack of understanding...
Anyhow, in the ha-debug log of server1 I see
> tengine[4459]: 2009/01/13_03:40:38 info: process_graph_event: Detected
action server1-fencing_monitor_120000 from a different transition: 29 vs. 31
>tengine[4459]: 2009/01/13_03:40:38 info: update_abort_priority: Abort
priority upgraded to 1000000
>tengine[4459]: 2009/01/13_03:40:38 WARN: update_failcount: Updating
failcount for server1-fencing on ebffe771-505c-4e40-b0b3-d70903ed37bc
after failed monitor: rc=14
Then pengine goes on about that problem:
>pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node
server2 is online
>pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping
server1-fencing_monitor_120000 (rc=14) on server2 to an ERROR
>pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing
failed op server1-fencing_monitor_120000 on server2: Error
>pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node
server1 is online
>pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping
server2-fencing_start_0 (rc=1) on server1 to an ERROR
>pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing
failed op server2-fencing_start_0 on server1: Error
>pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Compatability
handling for failed op server2-fencing_start_0 on server1
This was already the second instance of this problem last night, the
first occured 5 hours earlier, having no further consequences.
Now, however, pengine after a walk through all the resources says
>pengine[4460]: 2009/01/13_03:40:38 WARN: stage6: Scheduling Node
server2 for STONITH
There are some repetitions of this, until finally stonithd gives up:
>stonithd[3621]: 2009/01/13_03:42:19 ERROR: Failed to STONITH the node
server2: optype=RESET, op_result=TIMEOUT
The other machine, server2, did not show any problems whatsoever that
should have triggered the action.
The intended victim this time was the slave. However, during the last
weekend both machines showed very strange behavior, among them rebooting
each other ( with a days break, there was no stonith-war going on, but
still...)
Now in the cib.xml, we have the following two primitives
<primitive class="stonith" type="external/ipmi"
provider="heartbeat" id="server1-fencing">
<operations>
<op id="server1-fencing-monitor" name="monitor"
interval="120s" timeout="70s" prereq="nothing" start_delay="0"
disabled="false" role="Started" on_fail="fence"/>
<op id="server1-fencing-start" name="start" timeout="40s"
prereq="nothing" start_delay="0" disabled="false" role="Started"/>
</operations>
<instance_attributes id="server1-fencing-ia">
<attributes>
<nvpair id="server1-fencing-hostname" name="hostname"
value="server1"/>
<nvpair id="server1-fencing-ipaddr" name="ipaddr"
value="1.1.1.1"/>
<nvpair id="server1-fencing-userid" name="userid"
value="USER"/>
<nvpair id="server1-fencing-passwd" name="passwd"
value="PASSWD"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="server2-fencing" class="stonith"
type="external/ipmi" provider="heartbeat">
<operations>
<op id="server2-fencing-monitor" name="monitor"
interval="120s" timeout="70s" prereq="nothing"/>
<op id="server2-fencing-start" name="start" timeout="40s"
prereq="nothing"/>
</operations>
<instance_attributes id="server2-fencing-ia">
<attributes>
<nvpair id="server2-fencing-hostname" name="hostname"
value="server2"/>
<nvpair id="server2-fencing-ipaddr" name="ipaddr"
value="1.1.1.2"/>
<nvpair id="server2-fencing-userid" name="userid"
value="USER"/>
<nvpair id="server2-fencing-passwd" name="passwd"
value="PASSWD"/>
</attributes>
</instance_attributes>
</primitive>
The intent of this is to reboot one machine in case of a failover which
cannot be completed because failing heartbeat cannot release one of its
resources (Lustre mounts of drbd disks).
So my question is of course whether there is something fundamentally
wrong with this configuration (I'm attaching the entire cib.xml and the
ha-debug for last night, btw). Or whether there should really be an
error somewhere else (hardware).
Thanks a lot,
Thomas
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1225807850"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="ebffe771-505c-4e40-b0b3-d70903ed37bc" uname="server2" type="normal"/>
<node id="4df91cdf-680f-4cb8-b5cc-ffd07d54dc1b" uname="server1" type="normal"/>
</nodes>
<resources>
<group id="group_1">
<meta_attributes id="ma-group_1">
<attributes>
<nvpair id="ma-group_1" name="resource_stickiness" value="INFINITY"/>
</attributes>
</meta_attributes>
<primitive class="ocf" id="commonIP" provider="heartbeat" type="IPaddr">
<operations>
<op id="commonIP_mon" interval="20s" name="monitor" timeout="15s"/>
</operations>
<instance_attributes id="commonIP_inst_attr">
<attributes>
<nvpair id="commonIP_attr_0" name="ip" value="1.1.2.4"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="heartbeat" id="drbddisk_2" provider="heartbeat" type="drbddisk">
<operations>
<op id="drbddisk_2_mon" interval="120s" name="monitor" timeout="120s"/>
</operations>
<instance_attributes id="drbddisk_2_inst_attr">
<attributes>
<nvpair id="drbddisk_2_attr_1" name="1" value="mgs"/>
</attributes>
</instance_attributes>
<meta_attributes id="drbddisk_2_meta_attrs">
<attributes>
<nvpair id="drbddisk_2_metaattr_target_role" name="target_role" value="started"/>
</attributes>
</meta_attributes>
<instance_attributes id="drbddisk_2">
<attributes>
<nvpair id="drbddisk_2-is_managed" name="is_managed" value="on"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="Filesystem_3" provider="heartbeat" type="Filesystem">
<operations>
<op id="Filesystem_3_mon" interval="120s" name="monitor" timeout="180s"/>
</operations>
<instance_attributes id="Filesystem_3_inst_attr">
<attributes>
<nvpair id="Filesystem_3_attr_0" name="device" value="/dev/drbd0"/>
<nvpair id="Filesystem_3_attr_1" name="directory" value="/srv/mgs"/>
<nvpair id="Filesystem_3_attr_2" name="fstype" value="lustre"/>
</attributes>
</instance_attributes>
<meta_attributes id="Filesystem_3_meta_attrs">
<attributes>
<nvpair id="Filesystem_3_metaattr_target_role" name="target_role" value="started"/>
</attributes>
</meta_attributes>
</primitive>
<primitive class="heartbeat" id="drbddisk_4" provider="heartbeat" type="drbddisk">
<operations>
<op id="drbddisk_4_mon" interval="120s" name="monitor" timeout="120s"/>
</operations>
<instance_attributes id="drbddisk_4_inst_attr">
<attributes>
<nvpair id="drbddisk_4_attr_1" name="1" value="mdt"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="Filesystem_5" provider="heartbeat" type="Filesystem">
<operations>
<op id="Filesystem_5_mon" interval="120s" name="monitor" timeout="180s"/>
</operations>
<instance_attributes id="Filesystem_5_inst_attr">
<attributes>
<nvpair id="Filesystem_5_attr_0" name="device" value="/dev/drbd1"/>
<nvpair id="Filesystem_5_attr_1" name="directory" value="/srv/mdt"/>
<nvpair id="Filesystem_5_attr_2" name="fstype" value="lustre"/>
<nvpair id="Filesystem_5_attr_3" name="options" value="acl"/>
</attributes>
</instance_attributes>
<meta_attributes id="Filesystem_5_meta_attrs">
<attributes>
<nvpair id="Filesystem_5_meta_attrs-target_role" name="target_role" value="started"/>
</attributes>
</meta_attributes>
</primitive>
<primitive class="ocf" id="MailTo_6" provider="heartbeat" type="MailTo">
<operations>
<op id="MailTo_6_mon" interval="120s" name="monitor" timeout="60s"/>
</operations>
<instance_attributes id="MailTo_6_inst_attr">
<attributes>
<nvpair id="MailTo_6_attr_0" name="email" value="Admin-Email"/>
<nvpair id="MailTo_6_attr_1" name="subject" value="Failover_on_Lustre"/>
</attributes>
</instance_attributes>
<instance_attributes id="MailTo_6">
<attributes>
<nvpair id="MailTo_6-target_role" name="target_role" value="started"/>
<nvpair id="MailTo_6-Filesystem_5" name="Filesystem_5" value="started"/>
</attributes>
</instance_attributes>
</primitive>
</group>
<primitive class="stonith" type="external/ipmi" provider="heartbeat" id="server1-fencing">
<operations>
<op id="server1-fencing-monitor" name="monitor" interval="120s" timeout="70s" prereq="nothing" start_delay="0" disabled="false" role="Started" on_fail="fence"/>
<op id="server1-fencing-start" name="start" timeout="40s" prereq="nothing" start_delay="0" disabled="false" role="Started"/>
</operations>
<instance_attributes id="server1-fencing-ia">
<attributes>
<nvpair id="server1-fencing-hostname" name="hostname" value="server1"/>
<nvpair id="server1-fencing-ipaddr" name="ipaddr" value="1.1.1.1"/>
<nvpair id="server1-fencing-userid" name="userid" value="USER"/>
<nvpair id="server1-fencing-passwd" name="passwd" value="PASSWD"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="server2-fencing" class="stonith" type="external/ipmi" provider="heartbeat">
<operations>
<op id="server2-fencing-monitor" name="monitor" interval="120s" timeout="70s" prereq="nothing"/>
<op id="server2-fencing-start" name="start" timeout="40s" prereq="nothing"/>
</operations>
<instance_attributes id="server2-fencing-ia">
<attributes>
<nvpair id="server2-fencing-hostname" name="hostname" value="server2"/>
<nvpair id="server2-fencing-ipaddr" name="ipaddr" value="1.1.1.2"/>
<nvpair id="server2-fencing-userid" name="userid" value="USER"/>
<nvpair id="server2-fencing-passwd" name="passwd" value="PASSWD"/>
</attributes>
</instance_attributes>
</primitive>
</resources>
<constraints>
<rsc_location rsc="group_1" id="rsc_location_group_1">
<rule score="0" id="prefered_location_group_1">
<expression attribute="#uname" operation="eq" id="prefered_location_group_1_expr" value="server1"/>
</rule>
</rsc_location>
<rsc_location rsc="group_1" id="group_1:connected">
<rule score="-INFINITY" id="prefered_group_1:connected">
<expression attribute="default_ping_set" value="default_ping_set" id="group_1:connected:expr:undefined" operation="defined"/>
<expression attribute="default_ping_set" id="group_1:connected:expr:zero" operation="lte" value="0"/>
</rule>
</rsc_location>
</constraints>
</configuration>
tengine[4459]: 2009/01/12_22:34:28 info: process_graph_event: Action
server2-fencing_monitor_120000 arrived after a completed transition
tengine[4459]: 2009/01/12_22:34:28 info: update_abort_priority: Abort priority
upgraded to 1000000
tengine[4459]: 2009/01/12_22:34:28 WARN: update_failcount: Updating failcount
for server2-fencing on ebffe771-505c-4e40-b0b3-d70903ed37bc after failed
monitor: rc=14
crmd[3623]: 2009/01/12_22:34:28 info: do_state_transition: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
origin=route_message ]
crmd[3623]: 2009/01/12_22:34:28 info: do_state_transition: All 2 cluster nodes
are eligible to run resources.
tengine[4459]: 2009/01/12_22:34:28 info: extract_event: Aborting on
transient_attributes changes for ebffe771-505c-4e40-b0b3-d70903ed37bc
pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node server2
is online
pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping
server2-fencing_monitor_120000 (rc=14) on server2 to an ERROR
pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op
server2-fencing_monitor_120000 on server2: Error
pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node server1
is online
pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping
server2-fencing_start_0 (rc=1) on server1 to an ERROR
pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op
server2-fencing_start_0 on server1: Error
pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Compatability handling
for failed op server2-fencing_start_0 on server1
pengine[4460]: 2009/01/12_22:34:28 notice: group_print: Resource Group: group_1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: commonIP
(heartbeat::ocf:IPaddr): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: drbddisk_2
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: Filesystem_3
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: drbddisk_4
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: Filesystem_5
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: MailTo_6
(heartbeat::ocf:MailTo): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server1-fencing
(stonith:external/ipmi): Started server2
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server2-fencing
(stonith:external/ipmi): Started server2 FAILED
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
commonIP (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
drbddisk_2 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
Filesystem_3 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
drbddisk_4 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
Filesystem_5 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
MailTo_6 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
server1-fencing (server2)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Recover resource
server2-fencing (server2)
pengine[4460]: 2009/01/12_22:34:28 notice: StopRsc: server2 Stop
server2-fencing
pengine[4460]: 2009/01/12_22:34:28 notice: StartRsc: server2 Start
server2-fencing
pengine[4460]: 2009/01/12_22:34:28 notice: RecurringOp: server2
server2-fencing_monitor_120000
pengine[4460]: 2009/01/12_22:34:28 info: process_pe_message: Transition 30:
PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-406.bz2
pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node server2
is online
pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping
server2-fencing_monitor_120000 (rc=14) on server2 to an ERROR
pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op
server2-fencing_monitor_120000 on server2: Error
pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node server1
is online
pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping
server2-fencing_start_0 (rc=1) on server1 to an ERROR
pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op
server2-fencing_start_0 on server1: Error
pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Compatability handling
for failed op server2-fencing_start_0 on server1
pengine[4460]: 2009/01/12_22:34:28 notice: group_print: Resource Group: group_1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: commonIP
(heartbeat::ocf:IPaddr): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: drbddisk_2
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: Filesystem_3
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: drbddisk_4
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: Filesystem_5
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: MailTo_6
(heartbeat::ocf:MailTo): Started server1
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server1-fencing
(stonith:external/ipmi): Started server2
pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server2-fencing
(stonith:external/ipmi): Started server2 FAILED
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
commonIP (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
drbddisk_2 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
Filesystem_3 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
drbddisk_4 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
Filesystem_5 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
MailTo_6 (server1)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource
server1-fencing (server2)
pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Recover resource
server2-fencing (server2)
pengine[4460]: 2009/01/12_22:34:28 notice: StopRsc: server2 Stop
server2-fencing
pengine[4460]: 2009/01/12_22:34:28 notice: StartRsc: server2 Start
server2-fencing
pengine[4460]: 2009/01/12_22:34:28 notice: RecurringOp: server2
server2-fencing_monitor_120000
crmd[3623]: 2009/01/12_22:34:28 info: do_state_transition: State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
tengine[4459]: 2009/01/12_22:34:28 info: unpack_graph: Unpacked transition 31:
4 actions in 4 synapses
tengine[4459]: 2009/01/12_22:34:28 info: te_pseudo_action: Pseudo action 13
fired and confirmed
tengine[4459]: 2009/01/12_22:34:28 info: send_rsc_command: Initiating action 3:
server2-fencing_stop_0 on server2
pengine[4460]: 2009/01/12_22:34:28 info: process_pe_message: Transition 31:
PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-407.bz2
tengine[4459]: 2009/01/12_22:34:29 info: match_graph_event: Action
server2-fencing_stop_0 (3) confirmed on server2 (rc=0)
tengine[4459]: 2009/01/12_22:34:29 info: send_rsc_command: Initiating action
32: server2-fencing_start_0 on server2
tengine[4459]: 2009/01/12_22:34:31 info: match_graph_event: Action
server2-fencing_start_0 (32) confirmed on server2 (rc=0)
tengine[4459]: 2009/01/12_22:34:31 info: send_rsc_command: Initiating action 2:
server2-fencing_monitor_120000 on server2
tengine[4459]: 2009/01/12_22:34:33 info: match_graph_event: Action
server2-fencing_monitor_120000 (2) confirmed on server2 (rc=0)
tengine[4459]: 2009/01/12_22:34:33 info: run_graph: Transition 31: (Complete=4,
Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[4459]: 2009/01/12_22:34:33 info: notify_crmd: Transition 31 status:
te_complete - <null>
crmd[3623]: 2009/01/12_22:34:33 info: do_state_transition: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
cib[3619]: 2009/01/12_22:42:19 info: cib_stats: Processed 49 operations
(4285.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/12_22:52:19 info: cib_stats: Processed 39 operations
(4102.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/12_23:02:19 info: cib_stats: Processed 40 operations
(6000.00us average, 0% utilization) in the last 10min
lrmd[3620]: 2009/01/12_23:08:37 WARN: G_SIG_dispatch: Dispatch function for
SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x514c58)
lrmd[3620]: 2009/01/12_23:08:37 info: G_SIG_dispatch: started at 1731484574
should have started at 1731484474
cib[3619]: 2009/01/12_23:12:19 info: cib_stats: Processed 40 operations
(4000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/12_23:22:19 info: cib_stats: Processed 40 operations
(5000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/12_23:32:19 info: cib_stats: Processed 39 operations
(7435.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/12_23:42:19 info: cib_stats: Processed 40 operations
(5250.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/12_23:52:19 info: cib_stats: Processed 40 operations
(6500.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_00:02:19 info: cib_stats: Processed 39 operations
(5641.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_00:12:19 info: cib_stats: Processed 40 operations
(6000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_00:22:19 info: cib_stats: Processed 40 operations
(4750.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_00:32:19 info: cib_stats: Processed 40 operations
(4500.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_00:42:19 info: cib_stats: Processed 39 operations
(5128.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_00:52:19 info: cib_stats: Processed 40 operations
(4000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_01:02:19 info: cib_stats: Processed 40 operations
(5500.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_01:12:19 info: cib_stats: Processed 40 operations
(4500.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_01:22:19 info: cib_stats: Processed 39 operations
(4358.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_01:32:19 info: cib_stats: Processed 40 operations
(4750.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_01:42:19 info: cib_stats: Processed 40 operations
(7000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_01:52:19 info: cib_stats: Processed 40 operations
(5500.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_02:02:19 info: cib_stats: Processed 39 operations
(5384.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_02:12:19 info: cib_stats: Processed 40 operations
(5000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_02:22:19 info: cib_stats: Processed 40 operations
(6750.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_02:32:19 info: cib_stats: Processed 39 operations
(5897.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_02:42:19 info: cib_stats: Processed 40 operations
(3750.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_02:52:19 info: cib_stats: Processed 40 operations
(5000.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_03:02:19 info: cib_stats: Processed 40 operations
(5500.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_03:12:19 info: cib_stats: Processed 39 operations
(6153.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_03:22:19 info: cib_stats: Processed 40 operations
(4750.00us average, 0% utilization) in the last 10min
cib[3619]: 2009/01/13_03:32:19 info: cib_stats: Processed 40 operations
(4750.00us average, 0% utilization) in the last 10min
tengine[4459]: 2009/01/13_03:40:38 info: process_graph_event: Detected action
server1-fencing_monitor_120000 from a different transition: 29 vs. 31
tengine[4459]: 2009/01/13_03:40:38 info: update_abort_priority: Abort priority
upgraded to 1000000
tengine[4459]: 2009/01/13_03:40:38 WARN: update_failcount: Updating failcount
for server1-fencing on ebffe771-505c-4e40-b0b3-d70903ed37bc after failed
monitor: rc=14
crmd[3623]: 2009/01/13_03:40:38 info: do_state_transition: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
origin=route_message ]
crmd[3623]: 2009/01/13_03:40:38 info: do_state_transition: All 2 cluster nodes
are eligible to run resources.
pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node server2
is online
pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping
server1-fencing_monitor_120000 (rc=14) on server2 to an ERROR
pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing failed op
server1-fencing_monitor_120000 on server2: Error
pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node server1
is online
pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping
server2-fencing_start_0 (rc=1) on server1 to an ERROR
pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing failed op
server2-fencing_start_0 on server1: Error
pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Compatability handling
for failed op server2-fencing_start_0 on server1
pengine[4460]: 2009/01/13_03:40:38 notice: group_print: Resource Group: group_1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: commonIP
(heartbeat::ocf:IPaddr): Started server1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: drbddisk_2
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: Filesystem_3
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: drbddisk_4
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: Filesystem_5
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: MailTo_6
(heartbeat::ocf:MailTo): Started server1
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: server1-fencing
(stonith:external/ipmi): Started server2 FAILED
pengine[4460]: 2009/01/13_03:40:38 notice: native_print: server2-fencing
(stonith:external/ipmi): Started server2
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource
commonIP (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource
drbddisk_2 (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource
Filesystem_3 (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource
drbddisk_4 (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource
Filesystem_5 (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource
MailTo_6 (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Recover resource
server1-fencing (server1)
pengine[4460]: 2009/01/13_03:40:38 notice: StopRsc: server2 Stop
server1-fencing
pengine[4460]: 2009/01/13_03:40:38 notice: StartRsc: server1 Start
server1-fencing
pengine[4460]: 2009/01/13_03:40:38 notice: RecurringOp: server1
server1-fencing_monitor_120000
pengine[4460]: 2009/01/13_03:40:38 WARN: native_color: Resource server2-fencing
cannot run anywhere
pengine[4460]: 2009/01/13_03:40:38 notice: StopRsc: server2 Stop
server2-fencing
pengine[4460]: 2009/01/13_03:40:38 WARN: stage6: Scheduling Node server2 for
STONITH
pengine[4460]: 2009/01/13_03:40:38 WARN: native_stop_constraints: Stop of
failed resource server1-fencing is implicit after server2 is fenced
pengine[4460]: 2009/01/13_03:40:38 info: native_stop_constraints:
server2-fencing_stop_0 is implicit after server2 is fenced
crmd[3623]: 2009/01/13_03:40:38 info: do_state_transition: State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
tengine[4459]: 2009/01/13_03:40:38 info: unpack_graph: Unpacked transition 32:
7 actions in 7 synapses
tengine[4459]: 2009/01/13_03:40:38 info: te_pseudo_action: Pseudo action 2
fired and confirmed
tengine[4459]: 2009/01/13_03:40:38 info: send_rsc_command: Initiating action
29: server1-fencing_start_0 on server1
tengine[4459]: 2009/01/13_03:40:38 info: te_pseudo_action: Pseudo action 31
fired and confirmed
crmd[3623]: 2009/01/13_03:40:38 info: do_lrm_rsc_op: Performing
op=server1-fencing_start_0 key=29:32:8686f9af-9ced-43ab-bf20-be6e8437abc0)
lrmd[3620]: 2009/01/13_03:40:38 info: rsc:server1-fencing: start
lrmd[27785]: 2009/01/13_03:40:38 info: Try to start STONITH resource
<rsc_id=server1-fencing> : Device=external/ipmi
pengine[4460]: 2009/01/13_03:40:38 WARN: process_pe_message: Transition 32:
WARNINGs found during PE processing. PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-warn-6.bz2
pengine[4460]: 2009/01/13_03:40:38 info: process_pe_message: Configuration
WARNINGs found during PE processing. Please run "crm_verify -L" to identify
issues.
crmd[3623]: 2009/01/13_03:40:39 info: process_lrm_event: LRM operation
server1-fencing_start_0 (call=51, rc=0) complete
tengine[4459]: 2009/01/13_03:40:39 info: match_graph_event: Action
server1-fencing_start_0 (29) confirmed on server1 (rc=0)
tengine[4459]: 2009/01/13_03:40:39 info: send_rsc_command: Initiating action
30: server1-fencing_monitor_120000 on server1
tengine[4459]: 2009/01/13_03:40:39 info: te_pseudo_action: Pseudo action 32
fired and confirmed
tengine[4459]: 2009/01/13_03:40:39 info: te_fence_node: Executing reboot
fencing operation (33) on server2 (timeout=100000)
crmd[3623]: 2009/01/13_03:40:39 info: do_lrm_rsc_op: Performing
op=server1-fencing_monitor_120000
key=30:32:8686f9af-9ced-43ab-bf20-be6e8437abc0)
stonithd[3621]: 2009/01/13_03:40:39 info: client tengine [pid: 4459] want a
STONITH operation RESET to node server2.
stonithd[3621]: 2009/01/13_03:40:39 info: Broadcasting the message succeeded:
require others to stonith node server2.
crmd[3623]: 2009/01/13_03:40:39 info: process_lrm_event: LRM operation
server1-fencing_monitor_120000 (call=52, rc=0) complete
tengine[4459]: 2009/01/13_03:40:39 info: match_graph_event: Action
server1-fencing_monitor_120000 (30) confirmed on server1 (rc=0)
stonithd[3621]: 2009/01/13_03:42:19 ERROR: Failed to STONITH the node server2:
optype=RESET, op_result=TIMEOUT
tengine[4459]: 2009/01/13_03:42:19 info: tengine_stonith_callback: call=-2,
optype=1, node_name=server2, result=2, node_list=,
action=33:32:8686f9af-9ced-43ab-bf20-be6e8437abc0
tengine[4459]: 2009/01/13_03:42:19 ERROR: tengine_stonith_callback: Stonith of
server2 failed (2)... aborting transition.
tengine[4459]: 2009/01/13_03:42:19 info: update_abort_priority: Abort priority
upgraded to 1000000
tengine[4459]: 2009/01/13_03:42:19 info: update_abort_priority: Abort action 0
superceeded by 2
tengine[4459]: 2009/01/13_03:42:19 info: run_graph:
====================================================
tengine[4459]: 2009/01/13_03:42:19 notice: run_graph: Transition 32:
(Complete=6, Pending=0, Fired=0, Skipped=1, Incomplete=0)
crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: State transition
S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
origin=route_message ]
crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: All 2 cluster nodes
are eligible to run resources.
pengine[4460]: 2009/01/13_03:42:19 info: determine_online_status: Node server2
is online
pengine[4460]: 2009/01/13_03:42:19 ERROR: unpack_rsc_op: Remapping
server1-fencing_monitor_120000 (rc=14) on server2 to an ERROR
pengine[4460]: 2009/01/13_03:42:19 WARN: unpack_rsc_op: Processing failed op
server1-fencing_monitor_120000 on server2: Error
pengine[4460]: 2009/01/13_03:42:19 info: determine_online_status: Node server1
is online
pengine[4460]: 2009/01/13_03:42:19 ERROR: native_add_running: Resource
stonith::external/ipmi:server1-fencing appears to be active on 2 nodes.
pengine[4460]: 2009/01/13_03:42:19 ERROR: See
http://linux-ha.org/v2/faq/resource_too_active for more information.
pengine[4460]: 2009/01/13_03:42:19 ERROR: unpack_rsc_op: Remapping
server2-fencing_start_0 (rc=1) on server1 to an ERROR
pengine[4460]: 2009/01/13_03:42:19 WARN: unpack_rsc_op: Processing failed op
server2-fencing_start_0 on server1: Error
pengine[4460]: 2009/01/13_03:42:19 WARN: unpack_rsc_op: Compatability handling
for failed op server2-fencing_start_0 on server1
pengine[4460]: 2009/01/13_03:42:19 notice: group_print: Resource Group: group_1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: commonIP
(heartbeat::ocf:IPaddr): Started server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: drbddisk_2
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: Filesystem_3
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: drbddisk_4
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: Filesystem_5
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: MailTo_6
(heartbeat::ocf:MailTo): Started server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: server1-fencing
(stonith:external/ipmi)
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: 0 : server2
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: 1 : server1
pengine[4460]: 2009/01/13_03:42:19 notice: native_print: server2-fencing
(stonith:external/ipmi): Started server2
pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource
commonIP (server1)
pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource
drbddisk_2 (server1)
pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource
Filesystem_3 (server1)
pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource
drbddisk_4 (server1)
pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource
Filesystem_5 (server1)
pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource
MailTo_6 (server1)
pengine[4460]: 2009/01/13_03:42:19 ERROR: native_create_actions: Attempting
recovery of resource server1-fencing
pengine[4460]: 2009/01/13_03:42:19 notice: StopRsc: server2 Stop
server1-fencing
pengine[4460]: 2009/01/13_03:42:19 notice: StopRsc: server1 Stop
server1-fencing
pengine[4460]: 2009/01/13_03:42:19 notice: StartRsc: server1 Start
server1-fencing
pengine[4460]: 2009/01/13_03:42:19 notice: RecurringOp: server1
server1-fencing_monitor_120000
pengine[4460]: 2009/01/13_03:42:19 WARN: native_color: Resource server2-fencing
cannot run anywhere
pengine[4460]: 2009/01/13_03:42:19 notice: StopRsc: server2 Stop
server2-fencing
pengine[4460]: 2009/01/13_03:42:19 WARN: stage6: Scheduling Node server2 for
STONITH
pengine[4460]: 2009/01/13_03:42:19 WARN: native_stop_constraints: Stop of
failed resource server1-fencing is implicit after server2 is fenced
pengine[4460]: 2009/01/13_03:42:19 info: native_stop_constraints:
server2-fencing_stop_0 is implicit after server2 is fenced
crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
tengine[4459]: 2009/01/13_03:42:19 info: unpack_graph: Unpacked transition 33:
8 actions in 8 synapses
tengine[4459]: 2009/01/13_03:42:19 info: te_pseudo_action: Pseudo action 2
fired and confirmed
tengine[4459]: 2009/01/13_03:42:19 info: te_pseudo_action: Pseudo action 32
fired and confirmed
tengine[4459]: 2009/01/13_03:42:19 notice: run_graph:
====================================================
tengine[4459]: 2009/01/13_03:42:19 WARN: run_graph: Transition 33: (Complete=2,
Pending=0, Fired=0, Skipped=0, Incomplete=6)
tengine[4459]: 2009/01/13_03:42:19 ERROR: te_graph_trigger: Transition failed:
terminated
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Graph 33 (8 actions in 8
synapses): batch-limit=30 jobs, network-delay=60000ms
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 0 was confirmed
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 1 is pending
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: [Action 8]: Pending
(id: server1-fencing_monitor_120000, loc: server1, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 31]: Pending
(id: server1-fencing_start_0, loc: server1, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 2 is pending
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: [Action 30]: Pending
(id: server1-fencing_stop_0, loc: server1, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 13]: Pending
(id: all_stopped, type: pseduo, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 3 is pending
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: [Action 31]: Pending
(id: server1-fencing_start_0, loc: server1, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 2]:
Completed (id: server1-fencing_stop_0, type: pseduo, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 30]: Pending
(id: server1-fencing_stop_0, loc: server1, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 4 was confirmed
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 5 is pending
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: [Action 13]: Pending
(id: all_stopped, type: pseduo, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 34]: Pending
(id: stonith, loc: server2, type: crm, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 6 is pending
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: [Action 33]: Pending
(id: stonith_up, type: pseduo, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 31]: Pending
(id: server1-fencing_start_0, loc: server1, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 7 is pending
(priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: [Action 34]: Pending
(id: stonith, loc: server2, type: crm, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem: * [Input 33]: Pending
(id: stonith_up, type: pseduo, priority: 0)
tengine[4459]: 2009/01/13_03:42:19 info: notify_crmd: Transition 33 status:
te_complete - <null>
crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
pengine[4460]: 2009/01/13_03:42:19 ERROR: process_pe_message: Transition 33:
ERRORs found during PE processing. PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-error-22.bz2
pengine[4460]: 2009/01/13_03:42:19 info: process_pe_message: Configuration
WARNINGs found during PE processing. Please run "crm_verify -L" to identify
issues.
cib[3619]: 2009/01/13_03:42:19 info: cib_stats: Processed 47 operations
(4680.00us average, 0% utilization) in the last 10min
tengine[4459]: 2009/01/13_03:42:38 info: process_graph_event: Detected action
server1-fencing_monitor_120000 from a different transition: 29 vs. 33
tengine[4459]: 2009/01/13_03:42:38 info: update_abort_priority: Abort priority
upgraded to 1000000
crmd[3623]: 2009/01/13_03:42:38 info: do_state_transition: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
origin=route_message ]
crmd[3623]: 2009/01/13_03:42:38 info: do_state_transition: All 2 cluster nodes
are eligible to run resources.
pengine[4460]: 2009/01/13_03:42:38 info: determine_online_status: Node server2
is online
pengine[4460]: 2009/01/13_03:42:38 info: determine_online_status: Node server1
is online
pengine[4460]: 2009/01/13_03:42:38 ERROR: native_add_running: Resource
stonith::external/ipmi:server1-fencing appears to be active on 2 nodes.
pengine[4460]: 2009/01/13_03:42:38 ERROR: See
http://linux-ha.org/v2/faq/resource_too_active for more information.
pengine[4460]: 2009/01/13_03:42:38 ERROR: unpack_rsc_op: Remapping
server2-fencing_start_0 (rc=1) on server1 to an ERROR
pengine[4460]: 2009/01/13_03:42:38 WARN: unpack_rsc_op: Processing failed op
server2-fencing_start_0 on server1: Error
pengine[4460]: 2009/01/13_03:42:38 WARN: unpack_rsc_op: Compatability handling
for failed op server2-fencing_start_0 on server1
pengine[4460]: 2009/01/13_03:42:38 notice: group_print: Resource Group: group_1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: commonIP
(heartbeat::ocf:IPaddr): Started server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: drbddisk_2
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: Filesystem_3
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: drbddisk_4
(heartbeat:drbddisk): Started server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: Filesystem_5
(heartbeat::ocf:Filesystem): Started server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: MailTo_6
(heartbeat::ocf:MailTo): Started server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: server1-fencing
(stonith:external/ipmi)
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: 0 : server2
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: 1 : server1
pengine[4460]: 2009/01/13_03:42:38 notice: native_print: server2-fencing
(stonith:external/ipmi): Started server2
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
commonIP (server1)
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
drbddisk_2 (server1)
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
Filesystem_3 (server1)
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
drbddisk_4 (server1)
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
Filesystem_5 (server1)
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
MailTo_6 (server1)
pengine[4460]: 2009/01/13_03:42:38 ERROR: native_create_actions: Attempting
recovery of resource server1-fencing
pengine[4460]: 2009/01/13_03:42:38 notice: StopRsc: server2 Stop
server1-fencing
pengine[4460]: 2009/01/13_03:42:38 notice: StopRsc: server1 Stop
server1-fencing
pengine[4460]: 2009/01/13_03:42:38 notice: StartRsc: server2 Start
server1-fencing
pengine[4460]: 2009/01/13_03:42:38 notice: RecurringOp: server2
server1-fencing_monitor_120000
pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource
server2-fencing (server2)
crmd[3623]: 2009/01/13_03:42:38 info: do_state_transition: State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
tengine[4459]: 2009/01/13_03:42:38 info: unpack_graph: Unpacked transition 34:
5 actions in 5 synapses
tengine[4459]: 2009/01/13_03:42:38 info: te_pseudo_action: Pseudo action 13
fired and confirmed
tengine[4459]: 2009/01/13_03:42:38 info: send_rsc_command: Initiating action
30: server1-fencing_stop_0 on server2
tengine[4459]: 2009/01/13_03:42:38 info: send_rsc_command: Initiating action
31: server1-fencing_stop_0 on server1
crmd[3623]: 2009/01/13_03:42:38 info: do_lrm_rsc_op: Performing
op=server1-fencing_stop_0 key=31:34:8686f9af-9ced-43ab-bf20-be6e8437abc0)
lrmd[3620]: 2009/01/13_03:42:38 info: rsc:server1-fencing: stop
crmd[3623]: 2009/01/13_03:42:38 info: process_lrm_event: LRM operation
server1-fencing_monitor_120000 (call=52, rc=-2) Cancelled
pengine[4460]: 2009/01/13_03:42:38 ERROR: process_pe_message: Transition 34:
ERRORs found during PE processing. PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-error-23.bz2
lrmd[28014]: 2009/01/13_03:42:38 info: Try to stop STONITH resource
<rsc_id=server1-fencing> : Device=external/ipmi
crmd[3623]: 2009/01/13_03:42:38 info: process_lrm_event: LRM operation
server1-fencing_stop_0 (call=53, rc=0) complete
tengine[4459]: 2009/01/13_03:42:38 info: match_graph_event: Action
server1-fencing_stop_0 (31) confirmed on server1 (rc=0)
tengine[4459]: 2009/01/13_03:42:39 info: match_graph_event: Action
server1-fencing_stop_0 (30) confirmed on server2 (rc=0)
tengine[4459]: 2009/01/13_03:42:39 info: send_rsc_command: Initiating action
32: server1-fencing_start_0 on server2
tengine[4459]: 2009/01/13_03:42:41 info: match_graph_event: Action
server1-fencing_start_0 (32) confirmed on server2 (rc=0)
tengine[4459]: 2009/01/13_03:42:41 info: send_rsc_command: Initiating action 1:
server1-fencing_monitor_120000 on server2
tengine[4459]: 2009/01/13_03:42:42 info: match_graph_event: Action
server1-fencing_monitor_120000 (1) confirmed on server2 (rc=0)
tengine[4459]: 2009/01/13_03:42:42 info: run_graph: Transition 34: (Complete=5,
Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[4459]: 2009/01/13_03:42:42 info: notify_crmd: Transition 34 status:
te_complete - <null>
crmd[3623]: 2009/01/13_03:42:42 info: do_state_transition: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
cib[3619]: 2009/01/13_03:52:19 info: cib_stats: Processed 47 operations
(6382.00us average, 0% utilization) in the last 10min
lrmd[3620]: 2009/01/13_03:59:32 WARN: G_SIG_dispatch: Dispatch function for
SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x514c58)
lrmd[3620]: 2009/01/13_03:59:32 info: G_SIG_dispatch: started at 1733230017
should have started at 1733229917
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems