[Linux-HA] Strange Behaviour when master reboots meanwhile services running on slave

Jerome Caffet Wed, 12 Mar 2008 08:57:02 -0700

Hi

I am a 2 node cluster (node1 and node2) with simple configuration version 1
node1 is primary.


ha.cf :
bcast eth1
baud 19200
serial /dev/ttyS0
debugfile /var/log/heartbeat-debug.log
logfile /var/log/heartbeat.log
logfacility local0
keepalive 2
deadtime 10
warntime 6
initdead 20
udpport 1001
node node1 node2
auto_failback off

haresources :

node1 Filesystem::/dev/emcpowera::/data::ext3 mysqldIPaddr::xxx.xxx.xxx.xxx ldap jboss httpd



Last week, node1 broke down and node2 took the service. It worked well.

When node1 booted, the services stayed on node2 because of"auto_failback off" (it is what we want).

But the check once again node1, I rebooted it and I lost my services onnode2.

In fact, node1 sent a message to say he stopped :

heartbeat[5028]: 2008/03/07_10:47:40 info: Heartbeat shutdown inprogress. (5028)

heartbeat[17733]: 2008/03/07_10:47:40 info: Giving up all HA resources.

ResourceManager[17746]: 2008/03/07_10:47:40 info: Releasing resourcegroup: node1 Filesystem::/dev/emcpowera1::/data::ext3IPaddr::xxx.xxx.xxx.xxx mysqld ldap jboss httpd


On node2 :

heartbeat[4975]: 2008/03/07_10:47:41 info: Received shutdown notice from'node2'.heartbeat[4975]: 2008/03/07_10:47:41 info: Resources being acquired fromnode2.heartbeat[30169]: 2008/03/07_10:47:41 debug: notify_world: settingSIGCHLD Handler to SIG_DFLheartbeat[30170]: 2008/03/07_10:47:41 info: No local resources[/opt/heartbeat/share/heartbeat/ResourceManager listkeys node2] to acquire.harc[30169]: 2008/03/07_10:47:41 info: Running /etc/ha.d/rc.d/statusstatusheartbeat[4975]: 2008/03/07_10:47:41 debug: StartNextRemoteRscReq():child count 1mach_down[30198]: 2008/03/07_10:47:41 info: Taking over resourcegroup Filesystem::/dev/emcpowera1::/data::ext3ResourceManager[30224]: 2008/03/07_10:47:41 info: Acquiring resourcegroup: node1 Filesystem::/dev/emcpowera1::/data::ext3IPaddr::xxx.xxx.xxx.xxx mysqld ldap jboss httpd

Filesystem[30252]:      2008/03/07_10:47:42 INFO:  Running OK
IPaddr[30312]:  2008/03/07_10:47:42 INFO:  Running OK

But because ldap was already up, the return code was 1 and so heartbeatdecided to stop all the service on node2.


So why nodeb decided to start ALREADY running services ?
How the avoid this case ?

Thanks in advance

Jerome




_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Strange Behaviour when master reboots meanwhile services running on slave

Reply via email to