Hi all
I am test 1+1 redundant system by heartbeat 2.1.4-2.1
After I swithed primary to standby manually (by hb_standby) several times.
It didn't work any more. "cl_status rscstatus" always shows "transition"
The warning is recorded in the log file.
"heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from
test-srm-b ignored.  Other side is in flux."

Is it a bug of hearbeat?
Can I avoid it by tunning the parameters in ha.cf

Thanks,
Leon

ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 1
deadtime 5
warntime 3
initdead 30
udpport 694
baud 19200
serial /dev/ttyS0
auto_failback off
respawn hacluster /usr/lib/heartbeat/ipfail
debug 0
ucast eth1 10.1.1.1
node test-srm-a
node test-srm-b
ping_group sdvgroup 192.168.205.24
deadping 10
crm off

ha-debug:

heartbeat[29627]: 2008/12/12_17:05:59 info: test-srm-a wants to go standby
[all]
heartbeat[29627]: 2008/12/12_17:06:00 info: standby: acquire [all] resources
from test-srm-a
ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is unstable.
heartbeat[7333]: 2008/12/12_17:06:00 info: acquire all HA resources
(standby).
ResourceManager[7346]: 2008/12/12_17:06:00 info: Acquiring resource group:
test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM
PrimaryMode.sh::EdgeManager
IPaddr[7372]: 2008/12/12_17:06:00 INFO:  Running OK
ResourceManager[7346]: 2008/12/12_17:06:00 info: Running
/etc/ha.d/resource.d/PrimaryMode.sh GSRM start
ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting
/etc/ha.d/resource.d/PrimaryMode.sh GSRM start
GSRM has the following PIDs: 5599
ResourceManager[7346]: 2008/12/12_17:06:00 debug:
/etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0
ResourceManager[7346]: 2008/12/12_17:06:00 info: Running
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
ResourceManager[7346]: 2008/12/12_17:06:00 debug: Starting
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
EdgeManager has the following PIDs:
ResourceManager[7346]: 2008/12/12_17:06:00 debug:
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0
heartbeat[7333]: 2008/12/12_17:06:00 info: all HA resource acquisition
completed (standby).
heartbeat[29627]: 2008/12/12_17:06:00 info: Standby resource acquisition
done [all].
heartbeat[29627]: 2008/12/12_17:06:00 info: remote resource transition
completed.
ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable.
ipfail[29666]: 2008/12/12_17:06:00 debug: Other side is now stable.
heartbeat[29627]: 2008/12/12_17:37:01 info: test-srm-b wants to go standby
[all]
heartbeat[29627]: 2008/12/12_17:37:02 info: standby: test-srm-a can take our
all resources
heartbeat[7702]: 2008/12/12_17:37:02 info: give up all HA resources
(standby).
ResourceManager[7715]: 2008/12/12_17:37:02 info: Releasing resource group:
test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM
PrimaryMode.sh::EdgeManager
ResourceManager[7715]: 2008/12/12_17:37:02 info: Running
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop
ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop
EdgeManager has the following PIDs: 7525
ResourceManager[7715]: 2008/12/12_17:37:02 debug:
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager stop done. RC=0
ResourceManager[7715]: 2008/12/12_17:37:02 info: Running
/etc/ha.d/resource.d/PrimaryMode.sh GSRM stop
ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting
/etc/ha.d/resource.d/PrimaryMode.sh GSRM stop
GSRM has the following PIDs: 5599
ResourceManager[7715]: 2008/12/12_17:37:02 debug:
/etc/ha.d/resource.d/PrimaryMode.sh GSRM stop done. RC=0
ResourceManager[7715]: 2008/12/12_17:37:02 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.207.201 stop
ResourceManager[7715]: 2008/12/12_17:37:02 debug: Starting
/etc/ha.d/resource.d/IPaddr 192.168.207.201 stop
In IP Stop
SIOCDELRT: No such process
IPaddr[7813]: 2008/12/12_17:37:02 INFO: ifconfig eth0:0 down
IPaddr[7796]: 2008/12/12_17:37:02 INFO:  Success
INFO:  Success
ResourceManager[7715]: 2008/12/12_17:37:02 debug:
/etc/ha.d/resource.d/IPaddr 192.168.207.201 stop done. RC=0
heartbeat[7702]: 2008/12/12_17:37:02 info: all HA resource release completed
(standby).
heartbeat[29627]: 2008/12/12_17:37:02 info: Local standby process completed
[all].
heartbeat[29627]: 2008/12/12_17:37:02 WARN: 1 lost packet(s) for
[test-srm-a] [3241:3243]
heartbeat[29627]: 2008/12/12_17:37:02 info: remote resource transition
completed.
heartbeat[29627]: 2008/12/12_17:37:02 info: No pkts missing from test-srm-a!
ipfail[29666]: 2008/12/12_17:37:02 debug: Other side is now stable.
heartbeat[29627]: 2008/12/12_17:37:02 info: Other node completed standby
takeover of all resources.
ipfail[29666]: 2008/12/12_17:37:03 debug: Other side is now stable.
heartbeat[29627]: 2008/12/12_17:37:18 info: test-srm-a wants to go standby
[all]
ipfail[29666]: 2008/12/12_17:37:19 debug: Other side is unstable.
heartbeat[29627]: 2008/12/12_17:37:19 info: standby: acquire [all] resources
from test-srm-a
heartbeat[7855]: 2008/12/12_17:37:19 info: acquire all HA resources
(standby).
ResourceManager[7868]: 2008/12/12_17:37:19 info: Acquiring resource group:
test-srm-a IPaddr::192.168.207.201 PrimaryMode.sh::GSRM
PrimaryMode.sh::EdgeManager
IPaddr[7894]: 2008/12/12_17:37:19 INFO:  Resource is stopped
ResourceManager[7868]: 2008/12/12_17:37:19 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.207.201 start
ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting
/etc/ha.d/resource.d/IPaddr 192.168.207.201 start
IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated nic for
192.168.207.201: eth0
IPaddr[7965]: 2008/12/12_17:37:19 INFO: Using calculated netmask for
192.168.207.201: 255.255.252.0
IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Using calculated broadcast for
192.168.207.201: 192.168.207.255
IPaddr[7965]: 2008/12/12_17:37:19 INFO: eval ifconfig eth0:0
192.168.207.201netmask
255.255.252.0 broadcast 192.168.207.255
IPaddr[7965]: 2008/12/12_17:37:19 DEBUG: Sending Gratuitous Arp for
192.168.207.201 on eth0:0 [eth0]
IPaddr[7948]: 2008/12/12_17:37:19 INFO:  Success
INFO:  Success
ResourceManager[7868]: 2008/12/12_17:37:19 debug:
/etc/ha.d/resource.d/IPaddr 192.168.207.201 start done. RC=0
ResourceManager[7868]: 2008/12/12_17:37:19 info: Running
/etc/ha.d/resource.d/PrimaryMode.sh GSRM start
ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting
/etc/ha.d/resource.d/PrimaryMode.sh GSRM start
GSRM has the following PIDs: 5599
ResourceManager[7868]: 2008/12/12_17:37:19 debug:
/etc/ha.d/resource.d/PrimaryMode.sh GSRM start done. RC=0
ResourceManager[7868]: 2008/12/12_17:37:19 info: Running
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
ResourceManager[7868]: 2008/12/12_17:37:19 debug: Starting
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start
EdgeManager has the following PIDs: 7525
ResourceManager[7868]: 2008/12/12_17:37:19 debug:
/etc/ha.d/resource.d/PrimaryMode.sh EdgeManager start done. RC=0
heartbeat[7855]: 2008/12/12_17:37:19 info: all HA resource acquisition
completed (standby).
heartbeat[29627]: 2008/12/12_17:37:19 info: Standby resource acquisition
done [all].
heartbeat[29627]: 2008/12/12_17:37:19 WARN: standby message [me] from
test-srm-b ignored.  Other side is in flux.
heartbeat[29627]: 2008/12/12_17:37:20 info: remote resource transition
completed.
ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable.
heartbeat[29627]: 2008/12/12_17:37:20 ERROR: Ignored standby message 'other'
from test-srm-a in state 0
ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable.
ipfail[29666]: 2008/12/12_17:37:20 debug: Other side is now stable.
heartbeat[29627]: 2008/12/12_17:37:28 info: test-srm-b wants to go standby
[all]
heartbeat[29627]: 2008/12/12_17:37:39 WARN: No reply to standby request.
Standby request cancelled.
heartbeat[29627]: 2008/12/12_17:38:28 info: test-srm-b wants to go standby
[all]
heartbeat[29627]: 2008/12/12_17:38:38 WARN: No reply to standby request.
Standby request cancelled.
heartbeat[29627]: 2008/12/13_16:40:58 info: Daily informational memory
statistics
heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 8/294421 ms age 0
[pid29627/MST_CONTROL]
heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
[pid29627/MST_CONTROL]
heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
bytes. pid [29627/MST_CONTROL]
heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/73 ms age 82942080
[pid29629/HBFIFO]
heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
[pid29629/HBFIFO]
heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
bytes. pid [29629/HBFIFO]
heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484
[pid29630/HBWRITE]
heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
[pid29630/HBWRITE]
heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
bytes. pid [29630/HBWRITE]
heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/0 ms age 281644484
[pid29631/HBREAD]
heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
[pid29631/HBREAD]
heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
bytes. pid [29631/HBREAD]
heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 1/190354 ms age 0
[pid29632/HBWRITE]
heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
[pid29632/HBWRITE]
heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
bytes. pid [29632/HBWRITE]
heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
heartbeat[29627]: 2008/12/13_16:40:58 info: MSG stats: 0/86405 ms age 10
[pid29633/HBREAD]
heartbeat[29627]: 2008/12/13_16:40:58 info: cl_malloc stats: 0/0  0/0
[pid29633/HBREAD]
heartbeat[29627]: 2008/12/13_16:40:58 info: RealMalloc stats: 0 total malloc
bytes. pid [29633/HBREAD]
heartbeat[29627]: 2008/12/13_16:40:58 info: Current arena value: 0
heartbeat[29627]: 2008/12/13_16:40:58 info: These are nothing to worry
about.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to