Hi, On Fri, Jun 26, 2009 at 11:44:05AM +0200, artur.k wrote: > Hi > > I use heartbeat 2.1.4-7~bpo50+1 on debian lenny (xen domU) and > i have a problem. If the network connection is down and few > seconds currently now up, heartbeat not fail back :( on the > log: > > crmd[10201]: 2009/06/26_11:06:55 WARN: crmd_ha_msg_callback: Ignoring HA > message (op=noop) from storage-1: not in our mem > bership list (size=1) > ccm[10196]: 2009/06/26_11:06:55 info: Break tie for 2 nodes cluster > crmd[10201]: 2009/06/26_11:06:55 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > crmd[10201]: 2009/06/26_11:06:55 info: mem_handle_event: no mbr_track info > crmd[10201]: 2009/06/26_11:06:55 info: mem_handle_event: Got an event > OC_EV_MS_NEW_MEMBERSHIP from ccm > crmd[10201]: 2009/06/26_11:06:55 info: mem_handle_event: instance=1295, > nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 > crmd[10201]: 2009/06/26_11:06:55 info: crmd_ccm_msg_callback: Quorum > (re)attained after event=NEW MEMBERSHIP (id=1295) > crmd[10201]: 2009/06/26_11:06:55 info: ccm_event_detail: NEW MEMBERSHIP: > trans=1295, nodes=1, new=0, lost=0 n_idx=0, new_idx=1 > , old_idx=3 > crmd[10201]: 2009/06/26_11:06:55 info: ccm_event_detail: CURRENT: > trac-storage-2 [nodeid=1, born=1295] > cib[10197]: 2009/06/26_11:06:55 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > cib[10197]: 2009/06/26_11:06:55 info: mem_handle_event: no mbr_track info > cib[10197]: 2009/06/26_11:06:55 info: mem_handle_event: Got an event > OC_EV_MS_NEW_MEMBERSHIP from ccm > cib[10197]: 2009/06/26_11:06:55 info: mem_handle_event: instance=1295, > nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 > cib[10197]: 2009/06/26_11:06:55 info: cib_ccm_msg_callback: PEER: > trac-storage-2 > ccm[10196]: 2009/06/26_11:06:56 info: Break tie for 2 nodes cluster > cib[10197]: 2009/06/26_11:06:56 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > cib[10197]: 2009/06/26_11:06:56 info: mem_handle_event: no mbr_track info
The ha.cf is more appropriate to check this. If you have such intermittent problems with your interface you can't help with, you can increase the dead timeouts in ha.cf. The very best is to have reliable connections. If there's a problem with xen, you can file a bugzilla with them. The network interface shouldn't really be doing yo-yo. Thanks, Dejan > > my cib.xml : > > <cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false" > num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="126" > num_updates="3" cib-last-written="Fri Jun 26 11:42:56 2009" > ccm_transition="2" dc_uuid="0c57668f-5a90-49bd-af4c-06987e8773a4"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <attributes> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="2.1.4-node: aa909246edb386137b986c5773344b98c6969999"/> > </attributes> > </cluster_property_set> > </crm_config> > <nodes> > <node id="fcd92b39-cc52-4392-9c8e-f316c34070e6" uname="storage-1" > type="normal"/> > <node id="0c57668f-5a90-49bd-af4c-06987e8773a4" uname="storage-2" > type="normal"/> > </nodes> > <resources> > <primitive class="ocf" provider="heartbeat" type="IPaddr" id="ip0"> > <instance_attributes id="ia-ip0"> > <attributes> > <nvpair name="ip" id="ia-ip0-1" value="10.1.1.2"/> > </attributes> > </instance_attributes> > </primitive> > <master_slave id="ms-drbd0"> > <meta_attributes id="ma-ms-drbd0"> > <attributes> > <nvpair id="ma-ms-drbd0-1" name="clone_max" value="2"/> > <nvpair id="ma-ms-drbd0-2" name="clone_node_max" value="1"/> > <nvpair id="ma-ms-drbd0-3" name="master_max" value="1"/> > <nvpair id="ma-ms-drbd0-4" name="master_node_max" value="1"/> > <nvpair id="ma-ms-drbd0-5" name="notify" value="yes"/> > <nvpair id="ma-ms-drbd0-6" name="globally_unique" value="false"/> > <nvpair id="ma-ms-drbd0-7" name="target_role" value="started"/> > </attributes> > </meta_attributes> > <primitive class="ocf" provider="heartbeat" type="drbd" id="drbd0"> > <instance_attributes id="ia-drbd0"> > <attributes> > <nvpair id="ia-drbd0-1" name="drbd_resource" value="r0"/> > </attributes> > </instance_attributes> > <operations> > <op name="monitor" timeout="10s" role="Master" id="op-drbd0-1" > interval="20s"/> > <op name="monitor" timeout="10s" role="Slave" id="op-drbd0-2" > interval="21s"/> > </operations> > </primitive> > </master_slave> > <primitive class="ocf" provider="heartbeat" type="Filesystem" id="fs0"> > <instance_attributes id="ia-fs0"> > <attributes> > <nvpair id="ia-fs0-1" name="fstype" value="reiserfs"/> > <nvpair id="ia-fs0-2" name="directory" value="/mnt/drbd"/> > <nvpair id="ia-fs0-3" name="device" value="/dev/drbd0"/> > <nvpair id="ia-fs0-4" name="options" > value="rw,nosuid,nodev,noatime"/> > </attributes> > </instance_attributes> > </primitive> > <primitive id="nfsserver" class="lsb" type="nfs-kernel-server"/> > </resources> > <constraints> > <rsc_colocation id="ip_run" to="ms-drbd0" to_role="master" from="ip0" > score="infinity"/> > <rsc_colocation id="fs0_on_drbd0" to="ms-drbd0" to_role="master" > from="fs0" score="infinity"/> > <rsc_colocation id="nfs_run" to="ms-drbd0" to_role="master" > from="nfsserver" score="infinity"/> > <rsc_order id="start_fs0" from="fs0" action="start" to="ms-drbd0" > to_action="promote"/> > <rsc_order id="start_nfs" from="nfsserver" action="start" to="fs0" > type="after"/> > <rsc_order id="start_ip0" from="ip0" action="start" to="fs0" > type="after"/> > <rsc_order id="stop_fs0" from="fs0" action="stop" to="nfsserver" > type="after"/> > </constraints> > </configuration> > </cib> > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
