It appears that it is trying to start the ClusterIP on node2 but it
never does and I don't see any error preventing it. In my logs I see
this about every 30 seconds:

Aug 13 11:16:48 node2 tengine: [14098]: info: tengine_stonith_callback:
call=-100, optype=1, node_name=node1, result=2, node_list=,
action=5:114:a3663c3f-0b44-40c4-bd07-99d3ff079344
Aug 13 11:16:48 node2 crmd: [14085]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_IPC_MESSAGE origin=route_message ]
Aug 13 11:16:48 node2 tengine: [14098]: info: update_abort_priority:
Abort priority upgraded to 1000000
Aug 13 11:16:48 node2 crmd: [14085]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Aug 13 11:16:48 node2 tengine: [14098]: info: update_abort_priority:
Abort action 0 superceeded by 2
Aug 13 11:16:48 node2 tengine: [14098]: info: run_graph:
====================================================
Aug 13 11:16:48 node2 tengine: [14098]: notice: run_graph: Transition
114: (Complete=1, Pending=0, Fired=0, Skipped=2, Incomplete=0)
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'stop' for cluster option 'no-quorum-policy'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'true' for cluster option 'symmetric-cluster'
Aug 13 11:16:48 node2 crmd: [14085]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=route_message ]
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'reboot' for cluster option 'stonith-action'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '0' for cluster option 'default-resource-stickiness'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '0' for cluster option
'default-resource-failure-stickiness'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'true' for cluster option 'is-managed-default'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '60s' for cluster option 'cluster-delay'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '20s' for cluster option 'default-action-timeout'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'true' for cluster option 'stop-orphan-resources'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'true' for cluster option 'stop-orphan-actions'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'false' for cluster option 'remove-after-stop'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '-1' for cluster option 'pe-error-series-max'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '-1' for cluster option 'pe-warn-series-max'
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value '-1' for cluster option 'pe-input-series-max'
Aug 13 11:16:48 node2 tengine: [14098]: info: unpack_graph: Unpacked
transition 115: 3 actions in 3 synapses
Aug 13 11:16:48 node2 pengine: [14099]: notice: cluster_option: Using
default value 'true' for cluster option 'startup-fencing'
Aug 13 11:16:48 node2 pengine: [14099]: info: determine_online_status:
Node node2 is online
Aug 13 11:16:48 node2 tengine: [14098]: info: te_fence_node: Executing
reboot fencing operation (5) on node1 (timeout=30000)
Aug 13 11:16:48 node2 pengine: [14099]: WARN:
determine_online_status_fencing: Node node1
(24378a9e-3483-4ea4-bd7e-40a59a73a0e7) is un-expectedly down
Aug 13 11:16:48 node2 pengine: [14099]: info:
determine_online_status_fencing: ^Iha_state=dead, ccm_state=false,
crm_state=offline, join_state=down, expected=member
Aug 13 11:16:48 node2 stonithd: [14083]: info: client tengine [pid:
14098] want a STONITH operation RESET to node node1.
Aug 13 11:16:48 node2 pengine: [14099]: WARN: determine_online_status:
Node node1 is unclean
Aug 13 11:16:48 node2 stonithd: [14083]: info: Broadcasting the message
succeeded: require others to stonith node node1.
Aug 13 11:16:48 node2 pengine: [14099]: info: native_print:
ClusterIP^I(heartbeat::ocf:IPaddr2):^IStarted node1
Aug 13 11:16:48 node2 pengine: [14099]: notice: NoRoleChange: Move
resource ClusterIP^I(node1 -> node2)
Aug 13 11:16:48 node2 pengine: [14099]: WARN: custom_action: Action
ClusterIP_stop_0 on node1 is unrunnable (offline)
Aug 13 11:16:48 node2 pengine: [14099]: WARN: custom_action: Marking
node node1 unclean
Aug 13 11:16:48 node2 pengine: [14099]: notice: StartRsc:  node2^IStart
ClusterIP
Aug 13 11:16:48 node2 pengine: [14099]: WARN: stage6: Scheduling Node
node1 for STONITH
Aug 13 11:16:48 node2 pengine: [14099]: info: native_stop_constraints:
ClusterIP_stop_0 is implicit after node1 is fenced
Aug 13 11:16:48 node2 pengine: [14099]: WARN: process_pe_message:
Transition 115: WARNINGs found during PE processing. PEngine Input
stored in: /var/lib/heartbeat/pengine/pe-warn-100.raw
Aug 13 11:16:48 node2 pengine: [14099]: info: process_pe_message:
Configuration WARNINGs found during PE processing.  Please run
"crm_verify -L" to identify issues.

 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Brossard
Sent: Monday, August 13, 2007 11:07 AM
To: General Linux-HA mailing list
Subject: [Linux-HA] IP resource never fails over during outage

            Now I am having some more weirdness I cannot figure out. I
have setup a single resource of an IP address. It comes up fine, and I
can move it between nodes using crm_resource -M -R ClusterIP. However if
I reboot node1 when it is hosting the IP, the resource never fails over
to node2.  The gui shows it running on node1 even though node1 is
offine:

 

[EMAIL PROTECTED]:/var/lib/heartbeat$ crm_resource -L -V

crm_resource[15028]: 2007/08/13_11:04:39 info: Invoked: crm_resource -L
-V 

crm_resource[15028]: 2007/08/13_11:04:39 WARN:
determine_online_status_fencing: Node node1
(24378a9e-3483-4ea4-bd7e-40a59a73a0e7) is un-expectedly down

crm_resource[15028]: 2007/08/13_11:04:39 WARN: determine_online_status:
Node node1 is unclean

ClusterIP       (heartbeat::ocf:IPaddr2)

 

[EMAIL PROTECTED]:/var/lib/heartbeat$ crm_resource -L -V

crm_resource[15028]: 2007/08/13_11:04:39 info: Invoked: crm_resource -L
-V 

crm_resource[15028]: 2007/08/13_11:04:39 WARN:
determine_online_status_fencing: Node node1
(24378a9e-3483-4ea4-bd7e-40a59a73a0e7) is un-expectedly down

crm_resource[15028]: 2007/08/13_11:04:39 WARN: determine_online_status:
Node node1 is unclean

ClusterIP       (heartbeat::ocf:IPaddr2)

[EMAIL PROTECTED]:/var/lib/heartbeat$ crm_resource -x -r ClusterIP

crm_resource[15029]: 2007/08/13_11:05:04 info: Invoked: crm_resource -x
-r ClusterIP 

ClusterIP       (heartbeat::ocf:IPaddr2):       Started node1

raw xml:

 <primitive id="ClusterIP" class="ocf" type="IPaddr2"
provider="heartbeat">

   <instance_attributes id="ClusterIP_instance_attrs">

     <attributes>

       <nvpair id="3a5e34c7-d7dc-477f-8624-cc98ee7e1c41" name="ip"
value="172.31.252.7"/>

     </attributes>

   </instance_attributes>

 </primitive>

 

 

 

 

 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to