Hi all I am test 1+1 redundant system by heartbeat 2.1.4-2.1 After I swithed primary to standby manually (by hb_standby) several times. It didn't work any more. "cl_status rscstatus" always shows "transition" The warning is recorded in the log file. "heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from test-srm-b ignored. Other side is in flux."
Is it a bug of hearbeat? Can I avoid it by tunning the parameters in ha.cf Thanks, Leon ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 1 deadtime 5 warntime 3 initdead 30 udpport 694 baud 19200 serial /dev/ttyS0 auto_failback off respawn hacluster /usr/lib/heartbeat/ipfail debug 0 ucast eth1 10.1.1.1 node test-srm-a node test-srm-b ping_group sdvgroup 192.168.205.24 deadping 10 crm off ha-debug: heartbeat[29627]: 2008/12/12_17:05:59 info: test-srm-a wants to go standby [all] heartbeat[29627]: 2008/12/12_17:06:00 info: standby: acquire [all] resources from test-srm-a ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is unstable. heartbeat[7333]: 2008/12/12_17:06:00 info: acquire all HA resources (standby). ResourceManager[7346]: 2008/12/12_17:06:00 info: Acquiring resource group: test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM PrimaryMode.sh::EdgeManager IPaddr[7372]: 2008/12/12_17:06:00 INFO: Running OK ResourceManager[7346]: 2008/12/12_17:06:00 info: Running /etc/ha.d/resource.d/PrimaryMode.sh GSRM start ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting /etc/ha.d/resource.d/PrimaryMode.sh GSRM start GSRM has the following PIDs: 5599 ResourceManager[7346]: 2008/12/12_17:06:00 debug: /etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0 ResourceManager[7346]: 2008/12/12_17:06:00 info: Running /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start EdgeManager has the following PIDs: ResourceManager[7346]: 2008/12/12_17:06:00 debug: /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0 heartbeat[7333]: 2008/12/12_17:06:00 info: all HA resource acquisition completed (standby). heartbeat[29627]: 2008/12/12_17:06:00 info: Standby resource acquisition done [all]. heartbeat[29627]: 2008/12/12_17:06:00 info: remote resource transition completed. ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable. ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable. heartbeat[29627]: 2008/12/12_17:37:01 info: test-srm-b wants to go standby [all] heartbeat[29627]: 2008/12/12_17:37:02 info: standby: test-srm-a can take our all resources heartbeat[7702]: 2008/12/12_17:37:02 info: give up all HA resources (standby). ResourceManager[7715]: 2008/12/12_17:37:02 info: Releasing resource group: test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM PrimaryMode.sh::EdgeManager ResourceManager[7715]: 2008/12/12_17:37:02 info: Running /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop EdgeManager has the following PIDs: 7525 ResourceManager[7715]: 2008/12/12_17:37:02 debug: /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop done. RC=0 ResourceManager[7715]: 2008/12/12_17:37:02 info: Running /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop GSRM has the following PIDs: 5599 ResourceManager[7715]: 2008/12/12_17:37:02 debug: /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop done. RC=0 ResourceManager[7715]: 2008/12/12_17:37:02 info: Running /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop In IP Stop SIOCDELRT: No such process IPaddr[7813]: 2008/12/12_17:37:02 INFO: ifconfig eth0:0 down IPaddr[7796]: 2008/12/12_17:37:02 INFO: Success INFO: Success ResourceManager[7715]: 2008/12/12_17:37:02 debug: /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop done. RC=0 heartbeat[7702]: 2008/12/12_17:37:02 info: all HA resource release completed (standby). heartbeat[29627]: 2008/12/12_17:37:02 info: Local standby process completed [all]. heartbeat[29627]: 2008/12/12_17:37:02 WARN: 1 lost packet(s) for [test-srm-a] [3241:3243] heartbeat[29627]: 2008/12/12_17:37:02 info: remote resource transition completed. heartbeat[29627]: 2008/12/12_17:37:02 info: No pkts missing from test-srm-a! ipfail[29666]: 2008/12/12_17:37:02 debug: Other side is now stable. heartbeat[29627]: 2008/12/12_17:37:02 info: Other node completed standby takeover of all resources. ipfail[29666]: 2008/12/12_17:37:03 debug: Other side is now stable. heartbeat[29627]: 2008/12/12_17:37:18 info: test-srm-a wants to go standby [all] ipfail[29666]: 2008/12/12_17:37:19 debug: Other side is unstable. heartbeat[29627]: 2008/12/12_17:37:19 info: standby: acquire [all] resources from test-srm-a heartbeat[7855]: 2008/12/12_17:37:19 info: acquire all HA resources (standby). ResourceManager[7868]: 2008/12/12_17:37:19 info: Acquiring resource group: test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM PrimaryMode.sh::EdgeManager IPaddr[7894]: 2008/12/12_17:37:19 INFO: Resource is stopped ResourceManager[7868]: 2008/12/12_17:37:19 info: Running /etc/ha.d/resource.d/IPaddr 192.168.207.201 start ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.207.201 start IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated nic for 192.168.207.201: eth0 IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated netmask for 192.168.207.201: 255.255.252.0 IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Using calculated broadcast for 192.168.207.201: 192.168.207.255 IPaddr[7965]: 2008/12/12_17:37:19 INFO: eval ifconfig eth0:0 192.168.207.201netmask 255.255.252.0 broadcast 192.168.207.255 IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Sending Gratuitous Arp for 192.168.207.201 on eth0:0 [eth0] IPaddr[7948]: 2008/12/12_17:37:19 INFO: Success INFO: Success ResourceManager[7868]: 2008/12/12_17:37:19 debug: /etc/ha.d/resource.d/IPaddr 192.168.207.201 start done. RC=0 ResourceManager[7868]: 2008/12/12_17:37:19 info: Running /etc/ha.d/resource.d/PrimaryMode.sh GSRM start ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting /etc/ha.d/resource.d/PrimaryMode.sh GSRM start GSRM has the following PIDs: 5599 ResourceManager[7868]: 2008/12/12_17:37:19 debug: /etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0 ResourceManager[7868]: 2008/12/12_17:37:19 info: Running /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start EdgeManager has the following PIDs: 7525 ResourceManager[7868]: 2008/12/12_17:37:19 debug: /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0 heartbeat[7855]: 2008/12/12_17:37:19 info: all HA resource acquisition completed (standby). heartbeat[29627]: 2008/12/12_17:37:19 info: Standby resource acquisition done [all]. heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from test-srm-b ignored. Other side is in flux. heartbeat[29627]: 2008/12/12_17:37:20 info: remote resource transition completed. ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable. heartbeat[29627]: 2008/12/12_17:37:20 ERROR: Ignored standby message 'other' from test-srm-a in state 0 ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable. ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable. heartbeat[29627]: 2008/12/12_17:37:28 info: test-srm-b wants to go standby [all] heartbeat[29627]: 2008/12/12_17:37:39 WARN: No reply to standby request. Standby request cancelled. heartbeat[29627]: 2008/12/12_17:38:28 info: test-srm-b wants to go standby [all] heartbeat[29627]: 2008/12/12_17:38:38 WARN: No reply to standby request. Standby request cancelled. heartbeat[29627]: 2008/12/13_16:40:58 info: Daily informational memory statistics heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 8/294421 ms age 0 [pid29627/MST_CONTROL] heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 [pid29627/MST_CONTROL] heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc bytes. pid [29627/MST_CONTROL] heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/73 ms age 82942080 [pid29629/HBFIFO] heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 [pid29629/HBFIFO] heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc bytes. pid [29629/HBFIFO] heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484 [pid29630/HBWRITE] heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 [pid29630/HBWRITE] heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc bytes. pid [29630/HBWRITE] heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484 [pid29631/HBREAD] heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 [pid29631/HBREAD] heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc bytes. pid [29631/HBREAD] heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 1/190354 ms age 0 [pid29632/HBWRITE] heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 [pid29632/HBWRITE] heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc bytes. pid [29632/HBWRITE] heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/86405 ms age 10 [pid29633/HBREAD] heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0 0/0 [pid29633/HBREAD] heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc bytes. pid [29633/HBREAD] heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0 heartbeat[29627]: 2008/12/13_16:40:58 info: These are nothing to worry about. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
