Hi,

On Mon, Dec 15, 2008 at 10:21:58AM +0800, Yishi Li wrote:
> Hi all
> I am test 1+1 redundant system by heartbeat 2.1.4-2.1
> After I swithed primary to standby manually (by hb_standby) several times.
> It didn't work any more. "cl_status rscstatus" always shows "transition"
> The warning is recorded in the log file.
> "heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from
> test-srm-b ignored.  Other side is in flux."
> 
> Is it a bug of hearbeat?
> Can I avoid it by tunning the parameters in ha.cf
> 
> Thanks,
> Leon
> 
> ha.cf
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> keepalive 1
> deadtime 5
> warntime 3
> initdead 30
> udpport 694
> baud 19200
> serial /dev/ttyS0
> auto_failback off
> respawn hacluster /usr/lib/heartbeat/ipfail
> debug 0
> ucast eth1 10.1.1.1
> node test-srm-a
> node test-srm-b
> ping_group sdvgroup 192.168.205.24
> deadping 10
> crm off
> 
> ha-debug:
> 
> heartbeat[29627]: 2008/12/12_17:05:59 info: test-srm-a wants to go standby
> [all]
> heartbeat[29627]: 2008/12/12_17:06:00 info: standby: acquire [all] resources
> from test-srm-a
> ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is unstable.

Looks like some nodes have problems pinging the ping hosts. Check
your network/interface statistics.

Thanks,

Dejan

> heartbeat[7333]: 2008/12/12_17:06:00 info: acquire all HA resources
> (standby).
> ResourceManager[7346]: 2008/12/12_17:06:00 info: Acquiring resource group:
> test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM
> PrimaryMode.sh::EdgeManager
> IPaddr[7372]: 2008/12/12_17:06:00 INFO:  Running OK
> ResourceManager[7346]: 2008/12/12_17:06:00 info: Running
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM start
> ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM start
> GSRM has the following PIDs: 5599
> ResourceManager[7346]: 2008/12/12_17:06:00 debug:
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0
> ResourceManager[7346]: 2008/12/12_17:06:00 info: Running
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
> ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
> EdgeManager has the following PIDs:
> ResourceManager[7346]: 2008/12/12_17:06:00 debug:
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0
> heartbeat[7333]: 2008/12/12_17:06:00 info: all HA resource acquisition
> completed (standby).
> heartbeat[29627]: 2008/12/12_17:06:00 info: Standby resource acquisition
> done [all].
> heartbeat[29627]: 2008/12/12_17:06:00 info: remote resource transition
> completed.
> ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable.
> ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable.
> heartbeat[29627]: 2008/12/12_17:37:01 info: test-srm-b wants to go standby
> [all]
> heartbeat[29627]: 2008/12/12_17:37:02 info: standby: test-srm-a can take our
> all resources
> heartbeat[7702]: 2008/12/12_17:37:02 info: give up all HA resources
> (standby).
> ResourceManager[7715]: 2008/12/12_17:37:02 info: Releasing resource group:
> test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM
> PrimaryMode.sh::EdgeManager
> ResourceManager[7715]: 2008/12/12_17:37:02 info: Running
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop
> ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop
> EdgeManager has the following PIDs: 7525
> ResourceManager[7715]: 2008/12/12_17:37:02 debug:
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop done. RC=0
> ResourceManager[7715]: 2008/12/12_17:37:02 info: Running
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop
> ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop
> GSRM has the following PIDs: 5599
> ResourceManager[7715]: 2008/12/12_17:37:02 debug:
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM stop done. RC=0
> ResourceManager[7715]: 2008/12/12_17:37:02 info: Running
> /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop
> ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting
> /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop
> In IP Stop
> SIOCDELRT: No such process
> IPaddr[7813]: 2008/12/12_17:37:02 INFO: ifconfig eth0:0 down
> IPaddr[7796]: 2008/12/12_17:37:02 INFO:  Success
> INFO:  Success
> ResourceManager[7715]: 2008/12/12_17:37:02 debug:
> /etc/ha.d/resource.d/IPaddr 192.168.207.201 stop done. RC=0
> heartbeat[7702]: 2008/12/12_17:37:02 info: all HA resource release completed
> (standby).
> heartbeat[29627]: 2008/12/12_17:37:02 info: Local standby process completed
> [all].
> heartbeat[29627]: 2008/12/12_17:37:02 WARN: 1 lost packet(s) for
> [test-srm-a] [3241:3243]
> heartbeat[29627]: 2008/12/12_17:37:02 info: remote resource transition
> completed.
> heartbeat[29627]: 2008/12/12_17:37:02 info: No pkts missing from test-srm-a!
> ipfail[29666]: 2008/12/12_17:37:02 debug: Other side is now stable.
> heartbeat[29627]: 2008/12/12_17:37:02 info: Other node completed standby
> takeover of all resources.
> ipfail[29666]: 2008/12/12_17:37:03 debug: Other side is now stable.
> heartbeat[29627]: 2008/12/12_17:37:18 info: test-srm-a wants to go standby
> [all]
> ipfail[29666]: 2008/12/12_17:37:19 debug: Other side is unstable.
> heartbeat[29627]: 2008/12/12_17:37:19 info: standby: acquire [all] resources
> from test-srm-a
> heartbeat[7855]: 2008/12/12_17:37:19 info: acquire all HA resources
> (standby).
> ResourceManager[7868]: 2008/12/12_17:37:19 info: Acquiring resource group:
> test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM
> PrimaryMode.sh::EdgeManager
> IPaddr[7894]: 2008/12/12_17:37:19 INFO:  Resource is stopped
> ResourceManager[7868]: 2008/12/12_17:37:19 info: Running
> /etc/ha.d/resource.d/IPaddr 192.168.207.201 start
> ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting
> /etc/ha.d/resource.d/IPaddr 192.168.207.201 start
> IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated nic for
> 192.168.207.201: eth0
> IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated netmask for
> 192.168.207.201: 255.255.252.0
> IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Using calculated broadcast for
> 192.168.207.201: 192.168.207.255
> IPaddr[7965]: 2008/12/12_17:37:19 INFO: eval ifconfig eth0:0
> 192.168.207.201netmask
> 255.255.252.0 broadcast 192.168.207.255
> IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Sending Gratuitous Arp for
> 192.168.207.201 on eth0:0 [eth0]
> IPaddr[7948]: 2008/12/12_17:37:19 INFO:  Success
> INFO:  Success
> ResourceManager[7868]: 2008/12/12_17:37:19 debug:
> /etc/ha.d/resource.d/IPaddr 192.168.207.201 start done. RC=0
> ResourceManager[7868]: 2008/12/12_17:37:19 info: Running
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM start
> ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM start
> GSRM has the following PIDs: 5599
> ResourceManager[7868]: 2008/12/12_17:37:19 debug:
> /etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0
> ResourceManager[7868]: 2008/12/12_17:37:19 info: Running
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
> ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
> EdgeManager has the following PIDs: 7525
> ResourceManager[7868]: 2008/12/12_17:37:19 debug:
> /etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0
> heartbeat[7855]: 2008/12/12_17:37:19 info: all HA resource acquisition
> completed (standby).
> heartbeat[29627]: 2008/12/12_17:37:19 info: Standby resource acquisition
> done [all].
> heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from
> test-srm-b ignored.  Other side is in flux.
> heartbeat[29627]: 2008/12/12_17:37:20 info: remote resource transition
> completed.
> ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable.
> heartbeat[29627]: 2008/12/12_17:37:20 ERROR: Ignored standby message 'other'
> from test-srm-a in state 0
> ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable.
> ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable.
> heartbeat[29627]: 2008/12/12_17:37:28 info: test-srm-b wants to go standby
> [all]
> heartbeat[29627]: 2008/12/12_17:37:39 WARN: No reply to standby request.
> Standby request cancelled.
> heartbeat[29627]: 2008/12/12_17:38:28 info: test-srm-b wants to go standby
> [all]
> heartbeat[29627]: 2008/12/12_17:38:38 WARN: No reply to standby request.
> Standby request cancelled.
> heartbeat[29627]: 2008/12/13_16:40:58 info: Daily informational memory
> statistics
> heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 8/294421 ms age 0
> [pid29627/MST_CONTROL]
> heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
> [pid29627/MST_CONTROL]
> heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
> bytes. pid [29627/MST_CONTROL]
> heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
> heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/73 ms age 82942080
> [pid29629/HBFIFO]
> heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
> [pid29629/HBFIFO]
> heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
> bytes. pid [29629/HBFIFO]
> heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
> heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484
> [pid29630/HBWRITE]
> heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
> [pid29630/HBWRITE]
> heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
> bytes. pid [29630/HBWRITE]
> heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
> heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484
> [pid29631/HBREAD]
> heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
> [pid29631/HBREAD]
> heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
> bytes. pid [29631/HBREAD]
> heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
> heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 1/190354 ms age 0
> [pid29632/HBWRITE]
> heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
> [pid29632/HBWRITE]
> heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
> bytes. pid [29632/HBWRITE]
> heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
> heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/86405 ms age 10
> [pid29633/HBREAD]
> heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
> [pid29633/HBREAD]
> heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
> bytes. pid [29633/HBREAD]
> heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
> heartbeat[29627]: 2008/12/13_16:40:58 info: These are nothing to worry
> about.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to