Hello,
 
I have a problem with my MySQL+DRBD+Heartbeat setup which I don't really know 
how to resolve. There are two nodes which are connected both using a dedicated 
replication link(crossover cable) and they are on the same network outwards. A 
virtual IP is used which is switched over to the active node using Heartbeat. 
The nodes are running a Primary/Secondary setup. I heartbeat over both 
networks. 
My /etc/ha.d/ha.cf(identical on both nodes) looks like this:
logfacility local0
keepalive 1
deadtime 5
warntime 3
initdead 45
ucast eth0 192.168.23.241
ucast eth0 192.168.23.242
bcast eth1
auto_failback off
node ubuntubox1
node ubuntubox2
 
In the case where node1 is running in primary mode and I kill the computer, 
node2 detects this and takes over as it should. But when I unplug the network 
cable(not the crossover one) this is detected but since the heartbeat is going 
over both interfaces it can still be heard and nothing is done. In this case I 
would typically want to switch over so the Secondary node goes Primary. I tried 
to remove the heartbeat over the replication link(eth1 in the config file) so I 
could detect the disconnect. This gave me other problems, It seems like both 
boxes think they are the surviving one, but since the DRBD replication link is 
still up Box1(the one that was Secondary) tries to take over the resources but 
they are still locked by Box2(Primary) and fails. Logs below:
 
Box1(Secondary):
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: node ubuntubox2: is dead
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: No STONITH device 
configured.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: Shared disks are not 
protected.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Resources being acquired 
from ubuntubox2.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Link ubuntubox2:eth0 dead.
Mar 21 12:21:46 ubuntubox1 heartbeat: [22939]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Mar 21 12:21:46 ubuntubox1 harc[22939]: info: Running /etc/ha.d//rc.d/status 
status
Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: 
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: mach_down takeover complete 
for node ubuntubox2.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: mach_down takeover complete.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): 
child count 1
Mar 21 12:21:46 ubuntubox1 IPaddr2[23008]: INFO:  Resource is stopped
Mar 21 12:21:46 ubuntubox1 heartbeat: [22940]: info: Local Resource acquisition 
completed.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): 
child count 1
Mar 21 12:21:46 ubuntubox1 heartbeat: [23094]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Mar 21 12:21:46 ubuntubox1 harc[23094]: info: Running 
/etc/ha.d//rc.d/ip-request-resp ip-request-resp
Mar 21 12:21:46 ubuntubox1 ip-request-resp[23094]: received ip-request-resp 
IPaddr2::192.168.23.240/24/eth0/192.168.23.255 OK yes
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Acquiring resource 
group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk 
Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql
Mar 21 12:21:46 ubuntubox1 IPaddr2[23145]: INFO:  Resource is stopped
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running 
/etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 start
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip -f inet addr add 
192.168.23.240/24 brd 192.168.23.255 dev eth0
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip link set eth0 up
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: /usr/lib/heartbeat/send_arp -i 
200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.23.240 eth0 
192.168.23.240 auto not_used not_used
Mar 21 12:21:46 ubuntubox1 IPaddr2[23228]: INFO:  Success
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running 
/etc/ha.d/resource.d/drbddisk  start
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: ERROR: Return code 1 from 
/etc/ha.d/resource.d/drbddisk
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: CRIT: Giving up resources 
due to failure of drbddisk
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Releasing resource 
group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk 
Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running 
/etc/ha.d/resource.d/mysql  stop
Mar 21 12:21:58 ubuntubox1 mysql[23446]: Rather than invoking init scripts 
through /etc/init.d, use the service(8) utility, e.g. service mysql stop Since 
the script you are attempting to invoke has been converted to an Upstart job, 
you may also use the stop(8) utility, e.g. stop mysql
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop
Mar 21 12:21:59 ubuntubox1 Filesystem[23490]: INFO: Running stop for /dev/drbd0 
on /mnt/drbd
Mar 21 12:21:59 ubuntubox1 Filesystem[23482]: INFO:  Success
Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running 
/etc/ha.d/resource.d/drbddisk  stop
Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running 
/etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 stop
Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: IP status = ok, IP_CIP=
Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: ip -f inet addr delete 
192.168.23.240/24 dev eth0
Mar 21 12:21:59 ubuntubox1 IPaddr2[23585]: INFO:  Success
Mar 21 12:22:29 ubuntubox1 hb_standby[23790]: Going standby [foreign].
Mar 21 12:22:29 ubuntubox1 heartbeat: [8324]: ERROR: msg2string: Message with 
zero fields
 
Box2(Primary):
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: WARN: node ubuntubox1: is dead
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Dead node ubuntubox1 gave 
up resources.
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Link ubuntubox1:eth0 dead.
 
If I plug in the network cable again it seems to get up and running and make 
Box1 Primary which is fine. But is there any way to resolve this so if a box is 
removed from the network(while the replication link is still up) it can switch 
over to the other box?
 
Also if mysql would to crash, the software that is, this would not be detected 
either. Is there a way to get heartbeat to check if mysql is running as well 
and switch over in case of software crash?
 
Best Regards
Gustav Reiz
 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to