Hello, I'm using DRBD with heartbeat (R1 config) and ipfail.
Here are my versions CentOS 5.3 heartbeat-2.1.3-3.el5.centos drbd-km-2.6.18_128.1.14.el5-8.3.1-3 drbd-8.3.1-3 (compiled from source and installed with rpm) Each node (mail1 and mail2 ) has two interfaces. eth0 -> heartbeat link, vip and uplink eth1 -> hearbeat link and DRBD replication link To ensure a failover if one nic fails, i set ipfail to ping the gateway. First i testet my configuration with a firewall-rule to trigger ipfail. # iptables -A OUTPUT -p icmp --icmp-type 8 -j DROP That worked perfectly well. You can see it in the attached files messages.mail1.iptables and messages.mail2.iptables. I takes over the vip and drbddisk starts without an error. But if i try to pull the cable (or ip link set eth0 down) on mail1 eth0, the drbddisk resource doesn't get stopped on mail1 and so it fails to start on mail2. You can see that in messages.mail1.linkdown and messages.mail2.linkdown. Jul 23 15:51:12 mail2 ResourceManager[11462]: [11696]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk Jul 23 15:51:12 mail2 ResourceManager[11462]: [11697]: CRIT: Giving up resources due to failure of drbddisk::home You can also see these files attached: ha.cf haresources drbd.conf I don't rellay know if it's a problem with heartbeat or drbd, but i hope you can give me a hint. Thank you for your help!
drbd.conf
Description: Binary data
ha.cf
Description: Binary data
haresources
Description: Binary data
Jul 23 16:06:41 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:41 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:41 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:43 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:43 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:43 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:45 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:45 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:45 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:47 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:47 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:47 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:49 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:49 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:49 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:51 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:51 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:51 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:53 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:53 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:53 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:55 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:55 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:55 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:57 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:57 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:57 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:59 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:06:59 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:06:59 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:06:59 mail1 ipfail: [4581]: info: Status update: Node 10.9.9.14 now has status dead Jul 23 16:06:59 mail1 heartbeat: [4553]: WARN: node 10.9.9.14: is dead Jul 23 16:06:59 mail1 heartbeat: [4553]: info: Link 10.9.9.14:10.9.9.14 dead. Jul 23 16:06:59 mail1 harc[4928]: [4934]: info: Running /etc/ha.d/rc.d/status status Jul 23 16:06:59 mail1 heartbeat: [4553]: info: Managed status process 4928 exited with return code 0. Jul 23 16:06:59 mail1 ipfail: [4581]: info: NS: We are dead. :< Jul 23 16:06:59 mail1 ipfail: [4581]: info: Link Status update: Link 10.9.9.14/10.9.9.14 now has status dead Jul 23 16:06:59 mail1 ipfail: [4581]: info: We are dead. :< Jul 23 16:06:59 mail1 ipfail: [4581]: info: Asking other side for ping node count. Jul 23 16:07:00 mail1 ipfail: [4581]: info: Giving up because we were told that we have less ping nodes. Jul 23 16:07:00 mail1 ipfail: [4581]: info: Delayed giveup in 4 seconds. Jul 23 16:07:01 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:01 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:01 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:03 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:03 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:03 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:04 mail1 ipfail: [4581]: info: giveup() called (timeout worked) Jul 23 16:07:05 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:05 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:05 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:05 mail1 heartbeat: [4553]: info: mail1.example.com wants to go standby [all] Jul 23 16:07:05 mail1 heartbeat: [4553]: info: i_hold_resources: 1 Jul 23 16:07:05 mail1 heartbeat: [4553]: info: New standby state: 1 Jul 23 16:07:05 mail1 heartbeat: [4553]: WARN: Standby timer has 9490 ms left Jul 23 16:07:05 mail1 last message repeated 2 times Jul 23 16:07:05 mail1 heartbeat: [4553]: info: standby: mail2.example.com can take our all resources Jul 23 16:07:05 mail1 heartbeat: [4553]: WARN: Standby timer has 3600000 ms left Jul 23 16:07:05 mail1 heartbeat: [4553]: info: New standby state: 1 Jul 23 16:07:05 mail1 heartbeat: [4952]: info: give up all HA resources (standby). Jul 23 16:07:05 mail1 heartbeat: [4952]: info: go_standby: who: 1 resource set: all Jul 23 16:07:05 mail1 heartbeat: [4952]: info: go_standby: (query/action): (allkeys/givegroup) Jul 23 16:07:05 mail1 ResourceManager[4965]: [4976]: info: Releasing resource group: mail1.example.com IPaddr2::10.9.9.6/28/eth0/10.9.9.15 drbddisk::home Jul 23 16:07:05 mail1 ResourceManager[4965]: [4991]: info: Running /etc/ha.d/resource.d/drbddisk home stop Jul 23 16:07:05 mail1 kernel: drbd0: role( Primary -> Secondary ) Jul 23 16:07:05 mail1 ResourceManager[4965]: [5012]: info: Running /etc/ha.d/resource.d/IPaddr2 10.9.9.6/28/eth0/10.9.9.15 stop Jul 23 16:07:06 mail1 IPaddr2[5043]: [5072]: INFO: ip -f inet addr delete 10.9.9.6/28 dev eth0 Jul 23 16:07:06 mail1 IPaddr2[5043]: [5074]: INFO: ip -o -f inet addr show eth0 Jul 23 16:07:06 mail1 IPaddr2[5014]: [5076]: INFO: Success Jul 23 16:07:06 mail1 heartbeat: [4952]: info: all HA resource release completed (standby). Jul 23 16:07:06 mail1 heartbeat: [4952]: info: Writing type [ask_resources] message to FIFO Jul 23 16:07:06 mail1 heartbeat: [4952]: info: FIFO message [type ask_resources] written rc=47 Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599870 ms left Jul 23 16:07:06 mail1 heartbeat: [4553]: info: Local standby process completed [all]. Jul 23 16:07:06 mail1 heartbeat: [4553]: info: New standby state: 3 Jul 23 16:07:06 mail1 heartbeat: [4553]: info: Managed go_standby process 4952 exited with return code 0. Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599640 ms left Jul 23 16:07:06 mail1 kernel: drbd0: peer( Secondary -> Primary ) Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599130 ms left Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: 1 lost packet(s) for [mail2.example.com] [113:115] Jul 23 16:07:06 mail1 heartbeat: [4553]: info: remote resource transition completed. Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599130 ms left Jul 23 16:07:06 mail1 heartbeat: [4553]: info: other_holds_resources: 3 Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599130 ms left Jul 23 16:07:06 mail1 heartbeat: [4553]: info: No pkts missing from mail2.example.com! Jul 23 16:07:06 mail1 heartbeat: [4553]: info: Other node completed standby takeover of all resources. Jul 23 16:07:06 mail1 heartbeat: [4553]: info: New standby state: 0 Jul 23 16:07:07 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:07 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:07 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:07 mail1 heartbeat: [4553]: info: other_holds_resources: 3 Jul 23 16:07:09 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:09 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:09 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:11 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:11 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:11 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:13 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:13 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:13 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:15 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:15 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:15 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:17 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:17 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:17 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:19 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:19 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:19 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted Jul 23 16:07:21 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted Jul 23 16:07:21 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0 Jul 23 16:07:21 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
messages.mail1.linkdown
Description: Binary data
Jul 23 16:04:02 mail2 heartbeat: [12163]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Jul 23 16:04:02 mail2 heartbeat: [12163]: info: other_holds_resources: 1 Jul 23 16:04:02 mail2 heartbeat: [12163]: info: other_holds_resources: 1 Jul 23 16:07:00 mail2 ipfail: [12180]: info: Telling other node that we have more visible ping nodes. Jul 23 16:07:05 mail2 heartbeat: [12163]: info: mail1.example.com wants to go standby [all] Jul 23 16:07:05 mail2 heartbeat: [12163]: info: standby: other_holds_resources: 1 Jul 23 16:07:05 mail2 heartbeat: [12163]: info: New standby state: 2 Jul 23 16:07:05 mail2 heartbeat: [12163]: info: New standby state: 2 Jul 23 16:07:05 mail2 heartbeat: [12163]: WARN: Standby timer has 3600000 ms left Jul 23 16:07:05 mail2 kernel: drbd0: peer( Primary -> Secondary ) Jul 23 16:07:06 mail2 heartbeat: [12163]: WARN: Standby timer has 3599490 ms left Jul 23 16:07:06 mail2 heartbeat: [12163]: info: other_holds_resources: 0 Jul 23 16:07:06 mail2 heartbeat: [12163]: WARN: Standby timer has 3599490 ms left Jul 23 16:07:06 mail2 heartbeat: [12163]: WARN: Standby timer has 3599490 ms left Jul 23 16:07:06 mail2 heartbeat: [12163]: info: standby: acquire [all] resources from mail1.example.com Jul 23 16:07:06 mail2 heartbeat: [12163]: info: New standby state: 3 Jul 23 16:07:06 mail2 heartbeat: [12232]: info: acquire all HA resources (standby). Jul 23 16:07:06 mail2 heartbeat: [12232]: info: go_standby: who: 2 resource set: all Jul 23 16:07:06 mail2 heartbeat: [12232]: info: go_standby: (query/action): (allkeys/takegroup) Jul 23 16:07:06 mail2 ResourceManager[12245]: [12256]: info: Acquiring resource group: mail1.example.com IPaddr2::10.9.9.6/28/eth0/10.9.9.15 drbddisk::home Jul 23 16:07:06 mail2 IPaddr2[12268]: [12325]: INFO: Resource is stopped Jul 23 16:07:06 mail2 ResourceManager[12245]: [12339]: info: Running /etc/ha.d/resource.d/IPaddr2 10.9.9.6/28/eth0/10.9.9.15 start Jul 23 16:07:06 mail2 IPaddr2[12370]: [12405]: INFO: ip -f inet addr add 10.9.9.6/28 brd 10.9.9.15 dev eth0 Jul 23 16:07:06 mail2 IPaddr2[12370]: [12407]: INFO: ip link set eth0 up Jul 23 16:07:06 mail2 IPaddr2[12370]: [12409]: INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.9.9.6 eth0 10.9.9.6 auto not_used not_used Jul 23 16:07:06 mail2 IPaddr2[12341]: [12413]: INFO: Success Jul 23 16:07:06 mail2 ResourceManager[12245]: [12443]: info: Running /etc/ha.d/resource.d/drbddisk home start Jul 23 16:07:06 mail2 kernel: drbd0: role( Secondary -> Primary ) Jul 23 16:07:06 mail2 heartbeat: [12232]: info: all HA resource acquisition completed (standby). Jul 23 16:07:06 mail2 heartbeat: [12232]: info: Writing type [ask_resources] message to FIFO
messages.mail2.linkdown
Description: Binary data
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
