Hi
I am a 2 node cluster (node1 and node2) with simple configuration version 1
node1 is primary.
ha.cf :
bcast eth1
baud 19200
serial /dev/ttyS0
debugfile /var/log/heartbeat-debug.log
logfile /var/log/heartbeat.log
logfacility local0
keepalive 2
deadtime 10
warntime 6
initdead 20
udpport 1001
node node1 node2
auto_failback off
haresources :
node1 Filesystem::/dev/emcpowera::/data::ext3 mysqld
IPaddr::xxx.xxx.xxx.xxx ldap jboss httpd
Last week, node1 broke down and node2 took the service. It worked well.
When node1 booted, the services stayed on node2 because of
"auto_failback off" (it is what we want).
But the check once again node1, I rebooted it and I lost my services on
node2.
In fact, node1 sent a message to say he stopped :
heartbeat[5028]: 2008/03/07_10:47:40 info: Heartbeat shutdown in
progress. (5028)
heartbeat[17733]: 2008/03/07_10:47:40 info: Giving up all HA resources.
ResourceManager[17746]: 2008/03/07_10:47:40 info: Releasing resource
group: node1 Filesystem::/dev/emcpowera1::/data::ext3
IPaddr::xxx.xxx.xxx.xxx mysqld ldap jboss httpd
On node2 :
heartbeat[4975]: 2008/03/07_10:47:41 info: Received shutdown notice from
'node2'.
heartbeat[4975]: 2008/03/07_10:47:41 info: Resources being acquired from
node2.
heartbeat[30169]: 2008/03/07_10:47:41 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[30170]: 2008/03/07_10:47:41 info: No local resources
[/opt/heartbeat/share/heartbeat/ResourceManager listkeys node2] to acquire.
harc[30169]: 2008/03/07_10:47:41 info: Running /etc/ha.d/rc.d/status
status
heartbeat[4975]: 2008/03/07_10:47:41 debug: StartNextRemoteRscReq():
child count 1
mach_down[30198]: 2008/03/07_10:47:41 info: Taking over resource
group Filesystem::/dev/emcpowera1::/data::ext3
ResourceManager[30224]: 2008/03/07_10:47:41 info: Acquiring resource
group: node1 Filesystem::/dev/emcpowera1::/data::ext3
IPaddr::xxx.xxx.xxx.xxx mysqld ldap jboss httpd
Filesystem[30252]: 2008/03/07_10:47:42 INFO: Running OK
IPaddr[30312]: 2008/03/07_10:47:42 INFO: Running OK
But because ldap was already up, the return code was 1 and so heartbeat
decided to stop all the service on node2.
So why nodeb decided to start ALREADY running services ?
How the avoid this case ?
Thanks in advance
Jerome
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems