Re: [Linux-HA] network failure and recovery problem

Artur Kamiński Sun, 06 Sep 2009 08:20:21 -0700

Artur Kamiński pisze:
> Elo
>
> Somtimes my network works fail. If my netowrk down and up after few 
> seconds the ha cluster not recovery. I do not want to set up the high 
> deadtime. how to fix this ?
>
> on crm_mon
> Node: storage-2 (88d7ff6f-d400-40ef-a215-8cc7a6d29072): OFFLINE
> Node: storage-1 (7bce6375-3a7f-4ea1-9586-5b6f4c027190): online
>
> ha.cf:
> keepalive 1
> deadtime 10
>
> warntime 3
>
> initdead 15
> ....
> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
> apiauth ping gid=root uid=root
>
>
>
> version heartbeat-2_2.1.4-7~bpo50+1_all.deb
>
>
> on the log:
>
>
> heartbeat[11791]: 2009/09/06_16:40:38 info: Link storage-2:eth1 up.
> heartbeat[11791]: 2009/09/06_16:40:38 info: Link 10.1.131.65:10.1.131.65 up.
> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node 
> 10.1.131.65: interval 97770 ms
> heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node 
> 10.1.131.65: status ping
> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status 
> update: Ping node storage-2 now has status [up]
> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status 
> update: Ping node storage-2 now has status [up]
> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status 
> update: Ping node 10.1.131.65 now has status [up]
> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status 
> update: Ping node 10.1.131.65 now has status [up]
> pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes
> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status 
> update: Ping node 10.1.131.65 now has status [ping]
> pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes
> heartbeat[11791]: 2009/09/06_16:40:38 CRIT: Cluster node storage-2 
> returning after partition.
> heartbeat[11791]: 2009/09/06_16:40:38 info: For information on cluster 
> partitions, See URL: http://linux-ha.org/SplitBrain
> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Deadtime value may be too small.
> heartbeat[11791]: 2009/09/06_16:40:38 info: See FAQ for information on 
> tuning deadtime.
> heartbeat[11791]: 2009/09/06_16:40:38 info: URL: 
> http://linux-ha.org/FAQ#heavy_load
> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node 
> storage-2: interval 97760 ms
> heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node 
> storage-2: status active
> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status 
> update: Ping node storage-2 now has status [active]
>
>
>
>
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> __________ Information from ESET NOD32 Antivirus, version of virus signature 
> database 4400 (20090906) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>
>   
and:


cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: no mbr_track info
crmd[28891]: 2009/09/06_17:18:33 info: crmd_ccm_msg_callback: Quorum 
(re)attained after event=NEW MEMBERSHIP (id=271)
cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
cib[28887]: 2009/09/06_17:18:33 info: mem_handle_event: instance=271, 
nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
cib[28887]: 2009/09/06_17:18:33 info: cib_ccm_msg_callback: PEER: storage-1
crmd[28891]: 2009/09/06_17:18:33 info: ccm_event_detail: NEW MEMBERSHIP: 
trans=271, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
crmd[28891]: 2009/09/06_17:18:33 info: ccm_event_detail:        CURRENT: 
storage-1 [nodeid=0, born=271]
crmd[28891]: 2009/09/06_17:18:34 WARN: crmd_ha_msg_callback: Ignoring HA 
message (op=noop) from storage-2: not in our membership list (size=1)
ccm[28886]: 2009/09/06_17:18:34 info: Break tie for 2 nodes cluster
crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: no mbr_track info
crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
crmd[28891]: 2009/09/06_17:18:34 info: mem_handle_event: instance=272, 
nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
crmd[28891]: 2009/09/06_17:18:34 info: crmd_ccm_msg_callback: Quorum 
(re)attained after event=NEW MEMBERSHIP (id=272)
cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: no mbr_track info
cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
cib[28887]: 2009/09/06_17:18:34 info: mem_handle_event: instance=272, 
nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
cib[28887]: 2009/09/06_17:18:34 info: cib_ccm_msg_callback: PEER: storage-1
crmd[28891]: 2009/09/06_17:18:34 info: ccm_event_detail: NEW MEMBERSHIP: 
trans=272, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
crmd[28891]: 2009/09/06_17:18:34 info: ccm_event_detail:        CURRENT: 
storage-1 [nodeid=0, born=272]

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] network failure and recovery problem

Reply via email to