Hi
I use drbd and heartbeat to construct HA.But When I reboot or shutdown the
server , it run into a infinite loop.
The information in logfile as follows:
crmd[3852]: 2013/01/10_10:22:18 info: process_lrm_event: LRM operation
tomcatd_4_start_0 (call=157169, rc=0) complete
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
crmd[3852]: 2013/01/10_10:22:20 info: do_lrm_rsc_op: Performing
op=tomcatd_4_monitor_10000 key=2:10184:8e5cfe13-e5b1-43aa-b4d9-bbbd0c3f9df5)
crmd[3852]: 2013/01/10_10:22:20 info: do_lrm_rsc_op: Performing
op=ywproxy.sh_5_start_0 key=22:10184:8e5cfe13-e5b1-43aa-b4d9-bbbd0c3f9df5)
crmd[3852]: 2013/01/10_10:22:20 info: process_lrm_event: LRM operation
tomcatd_4_monitor_10000 (call=157154, rc=-2) Cancelled
crmd[3852]: 2013/01/10_10:22:20 info: process_lrm_event: LRM operation
tomcatd_4_monitor_10000 (call=157170, rc=0) complete
heartbeat[3742]: 2013/01/10_10:22:20 info: killing /usr/lib64/heartbeat/mgmtd
-v process group 3853 with signal 15
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
mgmtd[3853]: 2013/01/10_10:22:20 info: mgmtd is shutting down
mgmtd[3853]: 2013/01/10_10:22:20 debug: [mgmtd] stopped
heartbeat[3742]: 2013/01/10_10:22:20 info: killing /usr/lib64/heartbeat/crmd
process group 3852 with signal 15
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
crmd[3852]: 2013/01/10_10:22:20 info: crm_shutdown: Requesting shutdown
crmd[3852]: 2013/01/10_10:22:20 info: do_shutdown_req: Sending shutdown request
to DC: node_slave
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:21 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:21 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:23 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:23 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:25 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:25 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:25 ERROR: glib: Unable to send [-1] ucast
packet: No such device
heartbeat[3747]: 2013/01/10_10:22:25 ERROR: write_child: write failure on ucast
eth3.: No such device
heartbeat[3747]: 2013/01/10_10:22:25 WARN: Temporarily Suppressing write error
messages
my ha.cf as follows:
debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport694
ucast eth3 192.168.188.193
auto_failback on
nodenode_master
nodenode_slave
crm yes
The cib.xml as follows:
<cib admin_epoch="0" epoch="1" have_quorum="true" ignore_dtd="false"
num_peers="0" cib_feature_revision="2.0" generated="false" num_updates="4"
cib-last-written="Thu Nov 29 20:41:32 2012" ccm_transition="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-symmetric-cluster"
name="symmetric-cluster" value="true"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="stop"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="0"/>
<nvpair
id="cib-bootstrap-options-default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="0"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-stonith-action"
name="stonith-action" value="reboot"/>
<nvpair id="cib-bootstrap-options-startup-fencing"
name="startup-fencing" value="true"/>
<nvpair id="cib-bootstrap-options-stop-orphan-resources"
name="stop-orphan-resources" value="true"/>
<nvpair id="cib-bootstrap-options-stop-orphan-actions"
name="stop-orphan-actions" value="true"/>
<nvpair id="cib-bootstrap-options-remove-after-stop"
name="remove-after-stop" value="false"/>
<nvpair id="cib-bootstrap-options-short-resource-names"
name="short-resource-names" value="true"/>
<nvpair id="cib-bootstrap-options-transition-idle-timeout"
name="transition-idle-timeout" value="5min"/>
<nvpair id="cib-bootstrap-options-default-action-timeout"
name="default-action-timeout" value="20s"/>
<nvpair id="cib-bootstrap-options-is-managed-default"
name="is-managed-default" value="true"/>
<nvpair id="cib-bootstrap-options-cluster-delay"
name="cluster-delay" value="60s"/>
<nvpair id="cib-bootstrap-options-pe-error-series-max"
name="pe-error-series-max" value="-1"/>
<nvpair id="cib-bootstrap-options-pe-warn-series-max"
name="pe-warn-series-max" value="-1"/>
<nvpair id="cib-bootstrap-options-pe-input-series-max"
name="pe-input-series-max" value="-1"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
</nodes>
<resources>
<group id="group_1">
<primitive class="heartbeat" id="drbddisk_1" provider="heartbeat"
type="drbddisk">
<operations>
<op id="drbddisk_1_mon" interval="10s" name="monitor"
timeout="20s"/>
</operations>
<instance_attributes id="drbddisk_1_inst_attr">
<attributes>
<nvpair id="drbddisk_1_attr_1" name="1" value="r0"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="Filesystem_2" provider="heartbeat"
type="Filesystem">
<operations>
<op id="Filesystem_2_mon" interval="10s" name="monitor"
timeout="20s"/>
</operations>
<instance_attributes id="Filesystem_2_inst_attr">
<attributes>
<nvpair id="Filesystem_2_attr_0" name="device"
value="/dev/drbd1"/>
<nvpair id="Filesystem_2_attr_1" name="directory" value="/data"/>
<nvpair id="Filesystem_2_attr_2" name="fstype" value="ext3"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="lsb" id="pgsql_3" provider="heartbeat" type="pgsql">
<operations>
<op id="pgsql_3_mon" interval="10s" name="monitor" timeout="20s"/>
</operations>
</primitive>
<primitive class="heartbeat" id="tomcatd_4" provider="heartbeat"
type="tomcatd">
<operations>
<op id="tomcatd_4_mon" interval="10s" name="monitor"
timeout="20s"/>
</operations>
</primitive>
<primitive class="heartbeat" id="ywproxy.sh_5" provider="heartbeat"
type="ywproxy.sh">
<operations>
<op id="ywproxy.sh_5_mon" interval="10s" name="monitor"
timeout="30s"/>
</operations>
</primitive>
<primitive class="heartbeat" id="http_proxy.sh_6" provider="heartbeat"
type="http_proxy.sh">
<operations>
<op id="http_proxy.sh_6_mon" interval="10s" name="monitor" timeout="20s"/>
</operations>
</primitive>
<primitive class="ocf" id="IPaddr_59_65_233_194" provider="heartbeat"
type="IPaddr">
<operations>
<op id="IPaddr_59_65_233_194_mon" interval="5s" name="monitor"
timeout="5s"/>
</operations>
<instance_attributes id="IPaddr_59_65_233_194_inst_attr">
<attributes>
<nvpair id="IPaddr_59_65_233_194_attr_0" name="ip"
value="59.65.233.194"/>
</attributes>
</instance_attributes>
</primitive>
</group>
</resources>
<constraints>
<rsc_location id="rsc_location_group_1" rsc="group_1">
<rule id="prefered_location_group_1" score="100">
<expression attribute="#uname" id="prefered_location_group_1_expr"
operation="eq" value="node_master"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
I don't know where is the problem.Thank you very much for your time. I am
looking forward to your return.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems