Hello,
I have a problem with my MySQL+DRBD+Heartbeat setup which I don't really know
how to resolve. There are two nodes which are connected both using a dedicated
replication link(crossover cable) and they are on the same network outwards. A
virtual IP is used which is switched over to the active node using Heartbeat.
The nodes are running a Primary/Secondary setup. I heartbeat over both
networks.
My /etc/ha.d/ha.cf(identical on both nodes) looks like this:
logfacility local0
keepalive 1
deadtime 5
warntime 3
initdead 45
ucast eth0 192.168.23.241
ucast eth0 192.168.23.242
bcast eth1
auto_failback off
node ubuntubox1
node ubuntubox2
In the case where node1 is running in primary mode and I kill the computer,
node2 detects this and takes over as it should. But when I unplug the network
cable(not the crossover one) this is detected but since the heartbeat is going
over both interfaces it can still be heard and nothing is done. In this case I
would typically want to switch over so the Secondary node goes Primary. I tried
to remove the heartbeat over the replication link(eth1 in the config file) so I
could detect the disconnect. This gave me other problems, It seems like both
boxes think they are the surviving one, but since the DRBD replication link is
still up Box1(the one that was Secondary) tries to take over the resources but
they are still locked by Box2(Primary) and fails. Logs below:
Box1(Secondary):
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: node ubuntubox2: is dead
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: No STONITH device
configured.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: Shared disks are not
protected.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Resources being acquired
from ubuntubox2.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Link ubuntubox2:eth0 dead.
Mar 21 12:21:46 ubuntubox1 heartbeat: [22939]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Mar 21 12:21:46 ubuntubox1 harc[22939]: info: Running /etc/ha.d//rc.d/status
status
Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: mach_down takeover complete
for node ubuntubox2.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: mach_down takeover complete.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq():
child count 1
Mar 21 12:21:46 ubuntubox1 IPaddr2[23008]: INFO: Resource is stopped
Mar 21 12:21:46 ubuntubox1 heartbeat: [22940]: info: Local Resource acquisition
completed.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq():
child count 1
Mar 21 12:21:46 ubuntubox1 heartbeat: [23094]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Mar 21 12:21:46 ubuntubox1 harc[23094]: info: Running
/etc/ha.d//rc.d/ip-request-resp ip-request-resp
Mar 21 12:21:46 ubuntubox1 ip-request-resp[23094]: received ip-request-resp
IPaddr2::192.168.23.240/24/eth0/192.168.23.255 OK yes
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Acquiring resource
group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk
Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql
Mar 21 12:21:46 ubuntubox1 IPaddr2[23145]: INFO: Resource is stopped
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 start
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip -f inet addr add
192.168.23.240/24 brd 192.168.23.255 dev eth0
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip link set eth0 up
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: /usr/lib/heartbeat/send_arp -i
200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.23.240 eth0
192.168.23.240 auto not_used not_used
Mar 21 12:21:46 ubuntubox1 IPaddr2[23228]: INFO: Success
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running
/etc/ha.d/resource.d/drbddisk start
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: ERROR: Return code 1 from
/etc/ha.d/resource.d/drbddisk
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: CRIT: Giving up resources
due to failure of drbddisk
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Releasing resource
group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk
Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running
/etc/ha.d/resource.d/mysql stop
Mar 21 12:21:58 ubuntubox1 mysql[23446]: Rather than invoking init scripts
through /etc/init.d, use the service(8) utility, e.g. service mysql stop Since
the script you are attempting to invoke has been converted to an Upstart job,
you may also use the stop(8) utility, e.g. stop mysql
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop
Mar 21 12:21:59 ubuntubox1 Filesystem[23490]: INFO: Running stop for /dev/drbd0
on /mnt/drbd
Mar 21 12:21:59 ubuntubox1 Filesystem[23482]: INFO: Success
Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running
/etc/ha.d/resource.d/drbddisk stop
Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 stop
Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: IP status = ok, IP_CIP=
Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: ip -f inet addr delete
192.168.23.240/24 dev eth0
Mar 21 12:21:59 ubuntubox1 IPaddr2[23585]: INFO: Success
Mar 21 12:22:29 ubuntubox1 hb_standby[23790]: Going standby [foreign].
Mar 21 12:22:29 ubuntubox1 heartbeat: [8324]: ERROR: msg2string: Message with
zero fields
Box2(Primary):
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: WARN: node ubuntubox1: is dead
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Dead node ubuntubox1 gave
up resources.
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Link ubuntubox1:eth0 dead.
If I plug in the network cable again it seems to get up and running and make
Box1 Primary which is fine. But is there any way to resolve this so if a box is
removed from the network(while the replication link is still up) it can switch
over to the other box?
Also if mysql would to crash, the software that is, this would not be detected
either. Is there a way to get heartbeat to check if mysql is running as well
and switch over in case of software crash?
Best Regards
Gustav Reiz
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems