Re: [Linux-HA] can not reboot or shutdown the server

mike Thu, 10 Jan 2013 05:14:15 -0800

Is eth3 up at the time this thing goes into its loop?


On 13-01-10 07:50 AM, 赵长松 wrote:
> Hi
> I use drbd and heartbeat to construct HA.But When I reboot or shutdown the 
> server , it run into a infinite loop.
> The information in logfile as follows:
>
>
> crmd[3852]: 2013/01/10_10:22:18 info: process_lrm_event: LRM operation 
> tomcatd_4_start_0 (call=157169, rc=0) complete
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> crmd[3852]: 2013/01/10_10:22:20 info: do_lrm_rsc_op: Performing 
> op=tomcatd_4_monitor_10000 key=2:10184:8e5cfe13-e5b1-43aa-b4d9-bbbd0c3f9df5)
> crmd[3852]: 2013/01/10_10:22:20 info: do_lrm_rsc_op: Performing 
> op=ywproxy.sh_5_start_0 key=22:10184:8e5cfe13-e5b1-43aa-b4d9-bbbd0c3f9df5)
> crmd[3852]: 2013/01/10_10:22:20 info: process_lrm_event: LRM operation 
> tomcatd_4_monitor_10000 (call=157154, rc=-2) Cancelled
> crmd[3852]: 2013/01/10_10:22:20 info: process_lrm_event: LRM operation 
> tomcatd_4_monitor_10000 (call=157170, rc=0) complete
> heartbeat[3742]: 2013/01/10_10:22:20 info: killing /usr/lib64/heartbeat/mgmtd 
> -v process group 3853 with signal 15
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> mgmtd[3853]: 2013/01/10_10:22:20 info: mgmtd is shutting down
> mgmtd[3853]: 2013/01/10_10:22:20 debug: [mgmtd] stopped
> heartbeat[3742]: 2013/01/10_10:22:20 info: killing /usr/lib64/heartbeat/crmd 
> process group 3852 with signal 15
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> crmd[3852]: 2013/01/10_10:22:20 info: crm_shutdown: Requesting shutdown
> crmd[3852]: 2013/01/10_10:22:20 info: do_shutdown_req: Sending shutdown 
> request to DC: node_slave
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:20 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:21 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:21 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:23 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:23 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:25 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:25 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:25 ERROR: glib: Unable to send [-1] ucast 
> packet: No such device
> heartbeat[3747]: 2013/01/10_10:22:25 ERROR: write_child: write failure on 
> ucast eth3.: No such device
> heartbeat[3747]: 2013/01/10_10:22:25 WARN: Temporarily Suppressing write 
> error messages
>
>
> my ha.cf as follows:
>
>
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport694
> ucast eth3 192.168.188.193
> auto_failback on
> nodenode_master
> nodenode_slave
> crm yes
>
>
> The cib.xml as follows:
>   <cib admin_epoch="0" epoch="1" have_quorum="true" ignore_dtd="false" 
> num_peers="0" cib_feature_revision="2.0" generated="false" num_updates="4" 
> cib-last-written="Thu Nov 29 20:41:32 2012" ccm_transition="1">
>     <configuration>
>       <crm_config>
>         <cluster_property_set id="cib-bootstrap-options">
>           <attributes>
>             <nvpair id="cib-bootstrap-options-symmetric-cluster" 
> name="symmetric-cluster" value="true"/>
>             <nvpair id="cib-bootstrap-options-no-quorum-policy" 
> name="no-quorum-policy" value="stop"/>
>             <nvpair id="cib-bootstrap-options-default-resource-stickiness" 
> name="default-resource-stickiness" value="0"/>
>             <nvpair 
> id="cib-bootstrap-options-default-resource-failure-stickiness" 
> name="default-resource-failure-stickiness" value="0"/>
>             <nvpair id="cib-bootstrap-options-stonith-enabled" 
> name="stonith-enabled" value="false"/>
>             <nvpair id="cib-bootstrap-options-stonith-action" 
> name="stonith-action" value="reboot"/>
>             <nvpair id="cib-bootstrap-options-startup-fencing" 
> name="startup-fencing" value="true"/>
>             <nvpair id="cib-bootstrap-options-stop-orphan-resources" 
> name="stop-orphan-resources" value="true"/>
>             <nvpair id="cib-bootstrap-options-stop-orphan-actions" 
> name="stop-orphan-actions" value="true"/>
>             <nvpair id="cib-bootstrap-options-remove-after-stop" 
> name="remove-after-stop" value="false"/>
>             <nvpair id="cib-bootstrap-options-short-resource-names" 
> name="short-resource-names" value="true"/>
>             <nvpair id="cib-bootstrap-options-transition-idle-timeout" 
> name="transition-idle-timeout" value="5min"/>
>             <nvpair id="cib-bootstrap-options-default-action-timeout" 
> name="default-action-timeout" value="20s"/>
>             <nvpair id="cib-bootstrap-options-is-managed-default" 
> name="is-managed-default" value="true"/>
>             <nvpair id="cib-bootstrap-options-cluster-delay" 
> name="cluster-delay" value="60s"/>
>             <nvpair id="cib-bootstrap-options-pe-error-series-max" 
> name="pe-error-series-max" value="-1"/>
>             <nvpair id="cib-bootstrap-options-pe-warn-series-max" 
> name="pe-warn-series-max" value="-1"/>
>             <nvpair id="cib-bootstrap-options-pe-input-series-max" 
> name="pe-input-series-max" value="-1"/>
>           </attributes>
>         </cluster_property_set>
>       </crm_config>
>       <nodes>
>       </nodes>
>       <resources>
>         <group id="group_1">
>           <primitive class="heartbeat" id="drbddisk_1" provider="heartbeat" 
> type="drbddisk">
>             <operations>
>               <op id="drbddisk_1_mon" interval="10s" name="monitor" 
> timeout="20s"/>
>             </operations>
>             <instance_attributes id="drbddisk_1_inst_attr">
>               <attributes>
>                 <nvpair id="drbddisk_1_attr_1" name="1" value="r0"/>
>               </attributes>
>             </instance_attributes>
>           </primitive>
>           <primitive class="ocf" id="Filesystem_2" provider="heartbeat" 
> type="Filesystem">
>             <operations>
>               <op id="Filesystem_2_mon" interval="10s" name="monitor" 
> timeout="20s"/>
>             </operations>
>             <instance_attributes id="Filesystem_2_inst_attr">
>               <attributes>
>                 <nvpair id="Filesystem_2_attr_0" name="device" 
> value="/dev/drbd1"/>
>                 <nvpair id="Filesystem_2_attr_1" name="directory" 
> value="/data"/>
>                 <nvpair id="Filesystem_2_attr_2" name="fstype" value="ext3"/>
>               </attributes>
>             </instance_attributes>
>           </primitive>
>           <primitive class="lsb" id="pgsql_3" provider="heartbeat" 
> type="pgsql">
>             <operations>
>               <op id="pgsql_3_mon" interval="10s" name="monitor" 
> timeout="20s"/>
>             </operations>
>           </primitive>
>           <primitive class="heartbeat" id="tomcatd_4" provider="heartbeat" 
> type="tomcatd">
>             <operations>
>               <op id="tomcatd_4_mon" interval="10s" name="monitor" 
> timeout="20s"/>
>             </operations>
>           </primitive>
>           <primitive class="heartbeat" id="ywproxy.sh_5" provider="heartbeat" 
> type="ywproxy.sh">
>             <operations>
>               <op id="ywproxy.sh_5_mon" interval="10s" name="monitor" 
> timeout="30s"/>
>             </operations>
>           </primitive>
>           <primitive class="heartbeat" id="http_proxy.sh_6" 
> provider="heartbeat" type="http_proxy.sh">
>    <operations>
>      <op id="http_proxy.sh_6_mon" interval="10s" name="monitor" 
> timeout="20s"/>
>    </operations>
> </primitive>
>           <primitive class="ocf" id="IPaddr_59_65_233_194" 
> provider="heartbeat" type="IPaddr">
>             <operations>
>               <op id="IPaddr_59_65_233_194_mon" interval="5s" name="monitor" 
> timeout="5s"/>
>             </operations>
>             <instance_attributes id="IPaddr_59_65_233_194_inst_attr">
>               <attributes>
>                 <nvpair id="IPaddr_59_65_233_194_attr_0" name="ip" 
> value="59.65.233.194"/>
>               </attributes>
>             </instance_attributes>
>           </primitive>
>         </group>
>       </resources>
>       <constraints>
>         <rsc_location id="rsc_location_group_1" rsc="group_1">
>           <rule id="prefered_location_group_1" score="100">
>             <expression attribute="#uname" 
> id="prefered_location_group_1_expr" operation="eq" value="node_master"/>
>           </rule>
>         </rsc_location>
>       </constraints>
>     </configuration>
>   </cib>
>
>
> I don't know where is the problem.Thank you very much for your time. I am 
> looking forward to your return.
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] can not reboot or shutdown the server

Reply via email to