Is eth3 up at the time this thing goes into its loop?
On 13-01-10 07:50 AM, 赵长松 wrote: > Hi > I use drbd and heartbeat to construct HA.But When I reboot or shutdown the > server , it run into a infinite loop. > The information in logfile as follows: > > > crmd[3852]: 2013/01/10_10:22:18 info: process_lrm_event: LRM operation > tomcatd_4_start_0 (call=157169, rc=0) complete > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > crmd[3852]: 2013/01/10_10:22:20 info: do_lrm_rsc_op: Performing > op=tomcatd_4_monitor_10000 key=2:10184:8e5cfe13-e5b1-43aa-b4d9-bbbd0c3f9df5) > crmd[3852]: 2013/01/10_10:22:20 info: do_lrm_rsc_op: Performing > op=ywproxy.sh_5_start_0 key=22:10184:8e5cfe13-e5b1-43aa-b4d9-bbbd0c3f9df5) > crmd[3852]: 2013/01/10_10:22:20 info: process_lrm_event: LRM operation > tomcatd_4_monitor_10000 (call=157154, rc=-2) Cancelled > crmd[3852]: 2013/01/10_10:22:20 info: process_lrm_event: LRM operation > tomcatd_4_monitor_10000 (call=157170, rc=0) complete > heartbeat[3742]: 2013/01/10_10:22:20 info: killing /usr/lib64/heartbeat/mgmtd > -v process group 3853 with signal 15 > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > mgmtd[3853]: 2013/01/10_10:22:20 info: mgmtd is shutting down > mgmtd[3853]: 2013/01/10_10:22:20 debug: [mgmtd] stopped > heartbeat[3742]: 2013/01/10_10:22:20 info: killing /usr/lib64/heartbeat/crmd > process group 3852 with signal 15 > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > crmd[3852]: 2013/01/10_10:22:20 info: crm_shutdown: Requesting shutdown > crmd[3852]: 2013/01/10_10:22:20 info: do_shutdown_req: Sending shutdown > request to DC: node_slave > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:21 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:21 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:23 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:23 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:25 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:25 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:25 ERROR: glib: Unable to send [-1] ucast > packet: No such device > heartbeat[3747]: 2013/01/10_10:22:25 ERROR: write_child: write failure on > ucast eth3.: No such device > heartbeat[3747]: 2013/01/10_10:22:25 WARN: Temporarily Suppressing write > error messages > > > my ha.cf as follows: > > > debugfile /var/log/ha-debug > logfile /var/log/ha-log > keepalive 2 > deadtime 30 > warntime 10 > initdead 120 > udpport694 > ucast eth3 192.168.188.193 > auto_failback on > nodenode_master > nodenode_slave > crm yes > > > The cib.xml as follows: > <cib admin_epoch="0" epoch="1" have_quorum="true" ignore_dtd="false" > num_peers="0" cib_feature_revision="2.0" generated="false" num_updates="4" > cib-last-written="Thu Nov 29 20:41:32 2012" ccm_transition="1"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <attributes> > <nvpair id="cib-bootstrap-options-symmetric-cluster" > name="symmetric-cluster" value="true"/> > <nvpair id="cib-bootstrap-options-no-quorum-policy" > name="no-quorum-policy" value="stop"/> > <nvpair id="cib-bootstrap-options-default-resource-stickiness" > name="default-resource-stickiness" value="0"/> > <nvpair > id="cib-bootstrap-options-default-resource-failure-stickiness" > name="default-resource-failure-stickiness" value="0"/> > <nvpair id="cib-bootstrap-options-stonith-enabled" > name="stonith-enabled" value="false"/> > <nvpair id="cib-bootstrap-options-stonith-action" > name="stonith-action" value="reboot"/> > <nvpair id="cib-bootstrap-options-startup-fencing" > name="startup-fencing" value="true"/> > <nvpair id="cib-bootstrap-options-stop-orphan-resources" > name="stop-orphan-resources" value="true"/> > <nvpair id="cib-bootstrap-options-stop-orphan-actions" > name="stop-orphan-actions" value="true"/> > <nvpair id="cib-bootstrap-options-remove-after-stop" > name="remove-after-stop" value="false"/> > <nvpair id="cib-bootstrap-options-short-resource-names" > name="short-resource-names" value="true"/> > <nvpair id="cib-bootstrap-options-transition-idle-timeout" > name="transition-idle-timeout" value="5min"/> > <nvpair id="cib-bootstrap-options-default-action-timeout" > name="default-action-timeout" value="20s"/> > <nvpair id="cib-bootstrap-options-is-managed-default" > name="is-managed-default" value="true"/> > <nvpair id="cib-bootstrap-options-cluster-delay" > name="cluster-delay" value="60s"/> > <nvpair id="cib-bootstrap-options-pe-error-series-max" > name="pe-error-series-max" value="-1"/> > <nvpair id="cib-bootstrap-options-pe-warn-series-max" > name="pe-warn-series-max" value="-1"/> > <nvpair id="cib-bootstrap-options-pe-input-series-max" > name="pe-input-series-max" value="-1"/> > </attributes> > </cluster_property_set> > </crm_config> > <nodes> > </nodes> > <resources> > <group id="group_1"> > <primitive class="heartbeat" id="drbddisk_1" provider="heartbeat" > type="drbddisk"> > <operations> > <op id="drbddisk_1_mon" interval="10s" name="monitor" > timeout="20s"/> > </operations> > <instance_attributes id="drbddisk_1_inst_attr"> > <attributes> > <nvpair id="drbddisk_1_attr_1" name="1" value="r0"/> > </attributes> > </instance_attributes> > </primitive> > <primitive class="ocf" id="Filesystem_2" provider="heartbeat" > type="Filesystem"> > <operations> > <op id="Filesystem_2_mon" interval="10s" name="monitor" > timeout="20s"/> > </operations> > <instance_attributes id="Filesystem_2_inst_attr"> > <attributes> > <nvpair id="Filesystem_2_attr_0" name="device" > value="/dev/drbd1"/> > <nvpair id="Filesystem_2_attr_1" name="directory" > value="/data"/> > <nvpair id="Filesystem_2_attr_2" name="fstype" value="ext3"/> > </attributes> > </instance_attributes> > </primitive> > <primitive class="lsb" id="pgsql_3" provider="heartbeat" > type="pgsql"> > <operations> > <op id="pgsql_3_mon" interval="10s" name="monitor" > timeout="20s"/> > </operations> > </primitive> > <primitive class="heartbeat" id="tomcatd_4" provider="heartbeat" > type="tomcatd"> > <operations> > <op id="tomcatd_4_mon" interval="10s" name="monitor" > timeout="20s"/> > </operations> > </primitive> > <primitive class="heartbeat" id="ywproxy.sh_5" provider="heartbeat" > type="ywproxy.sh"> > <operations> > <op id="ywproxy.sh_5_mon" interval="10s" name="monitor" > timeout="30s"/> > </operations> > </primitive> > <primitive class="heartbeat" id="http_proxy.sh_6" > provider="heartbeat" type="http_proxy.sh"> > <operations> > <op id="http_proxy.sh_6_mon" interval="10s" name="monitor" > timeout="20s"/> > </operations> > </primitive> > <primitive class="ocf" id="IPaddr_59_65_233_194" > provider="heartbeat" type="IPaddr"> > <operations> > <op id="IPaddr_59_65_233_194_mon" interval="5s" name="monitor" > timeout="5s"/> > </operations> > <instance_attributes id="IPaddr_59_65_233_194_inst_attr"> > <attributes> > <nvpair id="IPaddr_59_65_233_194_attr_0" name="ip" > value="59.65.233.194"/> > </attributes> > </instance_attributes> > </primitive> > </group> > </resources> > <constraints> > <rsc_location id="rsc_location_group_1" rsc="group_1"> > <rule id="prefered_location_group_1" score="100"> > <expression attribute="#uname" > id="prefered_location_group_1_expr" operation="eq" value="node_master"/> > </rule> > </rsc_location> > </constraints> > </configuration> > </cib> > > > I don't know where is the problem.Thank you very much for your time. I am > looking forward to your return. > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
