Hi all,
My resource group just could not migrate to other idle node when one
resource in the group failed. I have read the correlative reference, such as:
http://www.linux-ha.org/v2/faq/forced_failover
http://clusterlabs.org/mw/Image:Configuration_Explained.pdf
and I think I have follow the steps about this topic, But it just can't help.
And my key cib.xml cofiguration is as follows:
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="100"/>
<nvpair
id="cib-bootstrap-options-default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="-99999"/>
................................................................
<group id="group1">
<primitive id="ap_1_ip_217" class="ocf" type="IPaddr2"
provider="heartbeat">
<instance_attributes id="ap_1_ip_attr_1">
<attributes>
<nvpair id="ap_1_ipaddr_1" name="ip" value="10.1.41.217"/>
<nvpair id="ap_1_nic_1" name="nic" value="eth1:1"/>
<nvpair id="ap_1_mask_1" name="netmask" value="25"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="ap_1_ip_217_agent" class="ocf" type="myagent"
provider="heartbeat">
<operations>
<op id="1" name="monitor" interval="5s" timeout="4s"
on_fail="restart"/>
</operations>
</primitive>
</group>
.........................................
and I set group "group1" to each node with the same score.
When the "ap_1_ip_217_agent" failed, the group “group1”can't migrate to other
node.(Thought I set the on_fail="restart")
And heartbeat just tried to re-start the service "ap_1_ip_217_agent" on the
same node.
I got the log message as follows:
......................................................................
Sep 25 09:23:22 ISD32_158_sles10 crmd: [6167]: info: do_lrm_rsc_op: Performing
op=ap_1_ip_217_agent_start_0 key=9:87:cdd56ff7-5350-4292-b377-8d1081aa9789)
Sep 25 09:23:22 ISD32_158_sles10 crmd: [6167]: ERROR: process_lrm_event: LRM
operation ap_1_ip_217_agent_start_0 (call=178, rc=1) Error unknown error
Sep 25 09:23:22 ISD32_158_sles10 crmd: [6167]: info: append_restart_list:
Resource ap_1_ip_217_agent does not support reloads
Sep 25 09:23:22 ISD32_158_sles10 cib: [6163]: info: cib_diff_notify: Update
(client: 6167, call:199): 0.633.17586 -> 0.633.17587 (ok)
Sep 25 09:23:22 ISD32_158_sles10 cib: [28327]: info: write_cib_contents: Wrote
version 0.633.17587 of the CIB to disk (digest:
89b25c4462439753f76d7c98b33c322f)
Sep 25 09:23:22 ISD32_158_sles10 crmd: [6167]: info: do_lrm_rsc_op: Performing
op=ap_1_ip_217_agent_stop_0 key=1:88:cdd56ff7-5350-4292-b377-8d1081aa9789)
Sep 25 09:23:22 ISD32_158_sles10 crmd: [6167]: info: process_lrm_event: LRM
operation ap_1_ip_217_agent_stop_0 (call=179, rc=0) complete
Sep 25 09:23:23 ISD32_158_sles10 cib: [6163]: info: cib_diff_notify: Update
(client: 6167, call:200): 0.633.17587 -> 0.633.17588 (ok)
Sep 25 09:23:23 ISD32_158_sles10 cib: [28335]: info: write_cib_contents: Wrote
version 0.633.17588 of the CIB to disk (digest:
6fd2b1136af3aee6e198ff5433daf282)
Sep 25 09:23:23 ISD32_158_sles10 crmd: [6167]: info: do_lrm_rsc_op: Performing
op=ap_1_ip_217_agent_start_0 key=9:88:cdd56ff7-5350-4292-b377-8d1081aa9789)
Sep 25 09:23:23 ISD32_158_sles10 crmd: [6167]: ERROR: process_lrm_event: LRM
operation ap_1_ip_217_agent_start_0 (call=180, rc=1) Error unknown error
Sep 25 09:23:23 ISD32_158_sles10 crmd: [6167]: info: append_restart_list:
Resource ap_1_ip_217_agent does not support reload
................................................................
How can I slove this problem?
Thanks in advance!_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems