Artur Kamiński pisze: > Elo > > Somtimes my network works fail. If my netowrk down and up after few > seconds the ha cluster not recovery. I do not want to set up the high > deadtime. how to fix this ? > > on crm_mon > Node: storage-2 (88d7ff6f-d400-40ef-a215-8cc7a6d29072): OFFLINE > Node: storage-1 (7bce6375-3a7f-4ea1-9586-5b6f4c027190): online > > ha.cf: > keepalive 1 > deadtime 10 > > warntime 3 > > initdead 15 > .... > respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s > apiauth ping gid=root uid=root > > > > version heartbeat-2_2.1.4-7~bpo50+1_all.deb > > > on the log: > > > heartbeat[11791]: 2009/09/06_16:40:38 info: Link storage-2:eth1 up. > heartbeat[11791]: 2009/09/06_16:40:38 info: Link 10.1.131.65:10.1.131.65 up. > heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node > 10.1.131.65: interval 97770 ms > heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node > 10.1.131.65: status ping > pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status > update: Ping node storage-2 now has status [up] > pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status > update: Ping node storage-2 now has status [up] > pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status > update: Ping node 10.1.131.65 now has status [up] > pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status > update: Ping node 10.1.131.65 now has status [up] > pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes > pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status > update: Ping node 10.1.131.65 now has status [ping] > pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes > heartbeat[11791]: 2009/09/06_16:40:38 CRIT: Cluster node storage-2 > returning after partition. > heartbeat[11791]: 2009/09/06_16:40:38 info: For information on cluster > partitions, See URL: http://linux-ha.org/SplitBrain > heartbeat[11791]: 2009/09/06_16:40:38 WARN: Deadtime value may be too small. > heartbeat[11791]: 2009/09/06_16:40:38 info: See FAQ for information on > tuning deadtime. > heartbeat[11791]: 2009/09/06_16:40:38 info: URL: > http://linux-ha.org/FAQ#heavy_load > heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node > storage-2: interval 97760 ms > heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node > storage-2: status active > pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status > update: Ping node storage-2 now has status [active] > > > > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > __________ Information from ESET NOD32 Antivirus, version of virus signature > database 4400 (20090906) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > and:
cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: no mbr_track info crmd[28891]: 2009/09/06_17:18:33 info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=271) cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: instance=271, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 cib[28887]: 2009/09/06_17:18:33 info: cib_ccm_msg_callback: PEER: storage-1 crmd[28891]: 2009/09/06_17:18:33 info: ccm_event_detail: NEW MEMBERSHIP: trans=271, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3 crmd[28891]: 2009/09/06_17:18:33 info: ccm_event_detail: CURRENT: storage-1 [nodeid=0, born=271] crmd[28891]: 2009/09/06_17:18:34 WARN: crmd_ha_msg_callback: Ignoring HA message (op=noop) from storage-2: not in our membership list (size=1) ccm[28886]: 2009/09/06_17:18:34 info: Break tie for 2 nodes cluster crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: no mbr_track info crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: instance=272, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 crmd[28891]: 2009/09/06_17:18:34 info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=272) cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: no mbr_track info cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: instance=272, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 cib[28887]: 2009/09/06_17:18:34 info: cib_ccm_msg_callback: PEER: storage-1 crmd[28891]: 2009/09/06_17:18:34 info: ccm_event_detail: NEW MEMBERSHIP: trans=272, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3 crmd[28891]: 2009/09/06_17:18:34 info: ccm_event_detail: CURRENT: storage-1 [nodeid=0, born=272] _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
