Hi, On Mon, Dec 15, 2008 at 10:21:58AM +0800, Yishi Li wrote: > Hi all > I am test 1+1 redundant system by heartbeat 2.1.4-2.1 > After I swithed primary to standby manually (by hb_standby) several times. > It didn't work any more. "cl_status rscstatus" always shows "transition" > The warning is recorded in the log file. > "heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from > test-srm-b ignored. Other side is in flux." > > Is it a bug of hearbeat? > Can I avoid it by tunning the parameters in ha.cf > > Thanks, > Leon > > ha.cf > debugfile /var/log/ha-debug > logfile /var/log/ha-log > keepalive 1 > deadtime 5 > warntime 3 > initdead 30 > udpport 694 > baud 19200 > serial /dev/ttyS0 > auto_failback off > respawn hacluster /usr/lib/heartbeat/ipfail > debug 0 > ucast eth1 10.1.1.1 > node test-srm-a > node test-srm-b > ping_group sdvgroup 192.168.205.24 > deadping 10 > crm off > > ha-debug: > > heartbeat[29627]: 2008/12/12_17:05:59 info: test-srm-a wants to go standby > [all] > heartbeat[29627]: 2008/12/12_17:06:00 info: standby: acquire [all] resources > from test-srm-a > ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is unstable.
Looks like some nodes have problems pinging the ping hosts. Check your network/interface statistics. Thanks, Dejan > heartbeat[7333]: 2008/12/12_17:06:00 info: acquire all HA resources > (standby). > ResourceManager[7346]: 2008/12/12_17:06:00 info: Acquiring resource group: > test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM > PrimaryMode.sh::EdgeManager > IPaddr[7372]: 2008/12/12_17:06:00 INFO: Running OK > ResourceManager[7346]: 2008/12/12_17:06:00 info: Running > /etc/ha.d/resource.d/PrimaryMode.sh GSRM start > ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting > /etc/ha.d/resource.d/PrimaryMode.sh GSRM start > GSRM has the following PIDs: 5599 > ResourceManager[7346]: 2008/12/12_17:06:00 debug: > /etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0 > ResourceManager[7346]: 2008/12/12_17:06:00 info: Running > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start > ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start > EdgeManager has the following PIDs: > ResourceManager[7346]: 2008/12/12_17:06:00 debug: > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0 > heartbeat[7333]: 2008/12/12_17:06:00 info: all HA resource acquisition > completed (standby). > heartbeat[29627]: 2008/12/12_17:06:00 info: Standby resource acquisition > done [all]. > heartbeat[29627]: 2008/12/12_17:06:00 info: remote resource transition > completed. > ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable. > ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable. > heartbeat[29627]: 2008/12/12_17:37:01 info: test-srm-b wants to go standby > [all] > heartbeat[29627]: 2008/12/12_17:37:02 info: standby: test-srm-a can take our > all resources > heartbeat[7702]: 2008/12/12_17:37:02 info: give up all HA resources > (standby). > ResourceManager[7715]: 2008/12/12_17:37:02 info: Releasing resource group: > test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM > PrimaryMode.sh::EdgeManager > ResourceManager[7715]: 2008/12/12_17:37:02 info: Running > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop > ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop > EdgeManager has the following PIDs: 7525 > ResourceManager[7715]: 2008/12/12_17:37:02 debug: > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop done. RC=0 > ResourceManager[7715]: 2008/12/12_17:37:02 info: Running > /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop > ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting > /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop > GSRM has the following PIDs: 5599 > ResourceManager[7715]: 2008/12/12_17:37:02 debug: > /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop done. RC=0 > ResourceManager[7715]: 2008/12/12_17:37:02 info: Running > /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop > ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting > /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop > In IP Stop > SIOCDELRT: No such process > IPaddr[7813]: 2008/12/12_17:37:02 INFO: ifconfig eth0:0 down > IPaddr[7796]: 2008/12/12_17:37:02 INFO: Success > INFO: Success > ResourceManager[7715]: 2008/12/12_17:37:02 debug: > /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop done. RC=0 > heartbeat[7702]: 2008/12/12_17:37:02 info: all HA resource release completed > (standby). > heartbeat[29627]: 2008/12/12_17:37:02 info: Local standby process completed > [all]. > heartbeat[29627]: 2008/12/12_17:37:02 WARN: 1 lost packet(s) for > [test-srm-a] [3241:3243] > heartbeat[29627]: 2008/12/12_17:37:02 info: remote resource transition > completed. > heartbeat[29627]: 2008/12/12_17:37:02 info: No pkts missing from test-srm-a! > ipfail[29666]: 2008/12/12_17:37:02 debug: Other side is now stable. > heartbeat[29627]: 2008/12/12_17:37:02 info: Other node completed standby > takeover of all resources. > ipfail[29666]: 2008/12/12_17:37:03 debug: Other side is now stable. > heartbeat[29627]: 2008/12/12_17:37:18 info: test-srm-a wants to go standby > [all] > ipfail[29666]: 2008/12/12_17:37:19 debug: Other side is unstable. > heartbeat[29627]: 2008/12/12_17:37:19 info: standby: acquire [all] resources > from test-srm-a > heartbeat[7855]: 2008/12/12_17:37:19 info: acquire all HA resources > (standby). > ResourceManager[7868]: 2008/12/12_17:37:19 info: Acquiring resource group: > test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM > PrimaryMode.sh::EdgeManager > IPaddr[7894]: 2008/12/12_17:37:19 INFO: Resource is stopped > ResourceManager[7868]: 2008/12/12_17:37:19 info: Running > /etc/ha.d/resource.d/IPaddr 192.168.207.201 start > ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting > /etc/ha.d/resource.d/IPaddr 192.168.207.201 start > IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated nic for > 192.168.207.201: eth0 > IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated netmask for > 192.168.207.201: 255.255.252.0 > IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Using calculated broadcast for > 192.168.207.201: 192.168.207.255 > IPaddr[7965]: 2008/12/12_17:37:19 INFO: eval ifconfig eth0:0 > 192.168.207.201netmask > 255.255.252.0 broadcast 192.168.207.255 > IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Sending Gratuitous Arp for > 192.168.207.201 on eth0:0 [eth0] > IPaddr[7948]: 2008/12/12_17:37:19 INFO: Success > INFO: Success > ResourceManager[7868]: 2008/12/12_17:37:19 debug: > /etc/ha.d/resource.d/IPaddr 192.168.207.201 start done. RC=0 > ResourceManager[7868]: 2008/12/12_17:37:19 info: Running > /etc/ha.d/resource.d/PrimaryMode.sh GSRM start > ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting > /etc/ha.d/resource.d/PrimaryMode.sh GSRM start > GSRM has the following PIDs: 5599 > ResourceManager[7868]: 2008/12/12_17:37:19 debug: > /etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0 > ResourceManager[7868]: 2008/12/12_17:37:19 info: Running > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start > ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start > EdgeManager has the following PIDs: 7525 > ResourceManager[7868]: 2008/12/12_17:37:19 debug: > /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0 > heartbeat[7855]: 2008/12/12_17:37:19 info: all HA resource acquisition > completed (standby). > heartbeat[29627]: 2008/12/12_17:37:19 info: Standby resource acquisition > done [all]. > heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from > test-srm-b ignored. Other side is in flux. > heartbeat[29627]: 2008/12/12_17:37:20 info: remote resource transition > completed. > ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable. > heartbeat[29627]: 2008/12/12_17:37:20 ERROR: Ignored standby message 'other' > from test-srm-a in state 0 > ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable. > ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable. > heartbeat[29627]: 2008/12/12_17:37:28 info: test-srm-b wants to go standby > [all] > heartbeat[29627]: 2008/12/12_17:37:39 WARN: No reply to standby request. > Standby request cancelled. > heartbeat[29627]: 2008/12/12_17:38:28 info: test-srm-b wants to go standby > [all] > heartbeat[29627]: 2008/12/12_17:38:38 WARN: No reply to standby request. > Standby request cancelled. > heartbeat[29627]: 2008/12/13_16:40:58 info: Daily informational memory > statistics > heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 8/294421 ms age 0 > [pid29627/MST_CONTROL] > heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 > [pid29627/MST_CONTROL] > heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc > bytes. pid [29627/MST_CONTROL] > heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 > heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/73 ms age 82942080 > [pid29629/HBFIFO] > heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 > [pid29629/HBFIFO] > heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc > bytes. pid [29629/HBFIFO] > heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 > heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484 > [pid29630/HBWRITE] > heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 > [pid29630/HBWRITE] > heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc > bytes. pid [29630/HBWRITE] > heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 > heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484 > [pid29631/HBREAD] > heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 > [pid29631/HBREAD] > heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc > bytes. pid [29631/HBREAD] > heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 > heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 1/190354 ms age 0 > [pid29632/HBWRITE] > heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 > [pid29632/HBWRITE] > heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc > bytes. pid [29632/HBWRITE] > heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 > heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/86405 ms age 10 > [pid29633/HBREAD] > heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 > [pid29633/HBREAD] > heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc > bytes. pid [29633/HBREAD] > heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 > heartbeat[29627]: 2008/12/13_16:40:58 info: These are nothing to worry > about. > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
