Hello,

I'm using DRBD with heartbeat (R1 config) and ipfail.

Here are my versions
CentOS 5.3
heartbeat-2.1.3-3.el5.centos
drbd-km-2.6.18_128.1.14.el5-8.3.1-3
drbd-8.3.1-3 (compiled from source and installed with rpm)

Each node (mail1 and mail2 ) has two interfaces.

eth0 -> heartbeat link, vip and uplink
eth1 -> hearbeat link and DRBD replication link

To ensure a failover if one nic fails, i set ipfail to ping the gateway.

First i testet my configuration with a firewall-rule to trigger ipfail.

# iptables -A OUTPUT -p icmp --icmp-type 8 -j DROP

That worked perfectly well.
You can see it in the attached files messages.mail1.iptables and
messages.mail2.iptables.
I takes over the vip and drbddisk starts without an error.

But if i try to pull the cable (or ip link set eth0 down) on mail1
eth0, the drbddisk resource doesn't get stopped on mail1
and so it fails to start on mail2.
You can see that in messages.mail1.linkdown and messages.mail2.linkdown.

Jul 23 15:51:12 mail2 ResourceManager[11462]: [11696]: ERROR: Return
code 1 from /etc/ha.d/resource.d/drbddisk
Jul 23 15:51:12 mail2 ResourceManager[11462]: [11697]: CRIT: Giving up
resources due to failure of drbddisk::home

You can also see these files attached:
ha.cf
haresources
drbd.conf

I don't rellay know if it's a problem with heartbeat or drbd, but i
hope you can give me a hint.

Thank you for your help!

Attachment: drbd.conf
Description: Binary data

Attachment: ha.cf
Description: Binary data

Attachment: haresources
Description: Binary data

Jul 23 16:06:41 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:41 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:41 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:43 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:43 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:43 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:45 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:45 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:45 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:47 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:47 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:47 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:49 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:49 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:49 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:51 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:51 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:51 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:53 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:53 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:53 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:55 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:55 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:55 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:57 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:57 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:57 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:59 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:06:59 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:06:59 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:06:59 mail1 ipfail: [4581]: info: Status update: Node 10.9.9.14 now has status dead
Jul 23 16:06:59 mail1 heartbeat: [4553]: WARN: node 10.9.9.14: is dead
Jul 23 16:06:59 mail1 heartbeat: [4553]: info: Link 10.9.9.14:10.9.9.14 dead.
Jul 23 16:06:59 mail1 harc[4928]: [4934]: info: Running /etc/ha.d/rc.d/status status
Jul 23 16:06:59 mail1 heartbeat: [4553]: info: Managed status process 4928 exited with return code 0.
Jul 23 16:06:59 mail1 ipfail: [4581]: info: NS: We are dead. :<
Jul 23 16:06:59 mail1 ipfail: [4581]: info: Link Status update: Link 10.9.9.14/10.9.9.14 now has status dead
Jul 23 16:06:59 mail1 ipfail: [4581]: info: We are dead. :<
Jul 23 16:06:59 mail1 ipfail: [4581]: info: Asking other side for ping node count.
Jul 23 16:07:00 mail1 ipfail: [4581]: info: Giving up because we were told that we have less ping nodes.
Jul 23 16:07:00 mail1 ipfail: [4581]: info: Delayed giveup in 4 seconds.
Jul 23 16:07:01 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:01 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:01 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:03 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:03 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:03 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:04 mail1 ipfail: [4581]: info: giveup() called (timeout worked)
Jul 23 16:07:05 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:05 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:05 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:05 mail1 heartbeat: [4553]: info: mail1.example.com wants to go standby [all]
Jul 23 16:07:05 mail1 heartbeat: [4553]: info: i_hold_resources: 1
Jul 23 16:07:05 mail1 heartbeat: [4553]: info: New standby state: 1
Jul 23 16:07:05 mail1 heartbeat: [4553]: WARN: Standby timer has 9490 ms left
Jul 23 16:07:05 mail1 last message repeated 2 times
Jul 23 16:07:05 mail1 heartbeat: [4553]: info: standby: mail2.example.com can take our all resources
Jul 23 16:07:05 mail1 heartbeat: [4553]: WARN: Standby timer has 3600000 ms left
Jul 23 16:07:05 mail1 heartbeat: [4553]: info: New standby state: 1
Jul 23 16:07:05 mail1 heartbeat: [4952]: info: give up all HA resources (standby).
Jul 23 16:07:05 mail1 heartbeat: [4952]: info: go_standby: who: 1 resource set: all
Jul 23 16:07:05 mail1 heartbeat: [4952]: info: go_standby: (query/action): (allkeys/givegroup)
Jul 23 16:07:05 mail1 ResourceManager[4965]: [4976]: info: Releasing resource group: mail1.example.com IPaddr2::10.9.9.6/28/eth0/10.9.9.15 drbddisk::home
Jul 23 16:07:05 mail1 ResourceManager[4965]: [4991]: info: Running /etc/ha.d/resource.d/drbddisk home stop
Jul 23 16:07:05 mail1 kernel: drbd0: role( Primary -> Secondary ) 
Jul 23 16:07:05 mail1 ResourceManager[4965]: [5012]: info: Running /etc/ha.d/resource.d/IPaddr2 10.9.9.6/28/eth0/10.9.9.15 stop
Jul 23 16:07:06 mail1 IPaddr2[5043]: [5072]: INFO: ip -f inet addr delete 10.9.9.6/28 dev eth0
Jul 23 16:07:06 mail1 IPaddr2[5043]: [5074]: INFO: ip -o -f inet addr show eth0
Jul 23 16:07:06 mail1 IPaddr2[5014]: [5076]: INFO:  Success
Jul 23 16:07:06 mail1 heartbeat: [4952]: info: all HA resource release completed (standby).
Jul 23 16:07:06 mail1 heartbeat: [4952]: info: Writing type [ask_resources] message to FIFO
Jul 23 16:07:06 mail1 heartbeat: [4952]: info: FIFO message [type ask_resources] written rc=47
Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599870 ms left
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: Local standby process completed [all].
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: New standby state: 3
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: Managed go_standby process 4952 exited with return code 0.
Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599640 ms left
Jul 23 16:07:06 mail1 kernel: drbd0: peer( Secondary -> Primary ) 
Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599130 ms left
Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: 1 lost packet(s) for [mail2.example.com] [113:115]
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: remote resource transition completed.
Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599130 ms left
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: other_holds_resources: 3
Jul 23 16:07:06 mail1 heartbeat: [4553]: WARN: Standby timer has 3599130 ms left
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: No pkts missing from mail2.example.com!
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: Other node completed standby takeover of all resources.
Jul 23 16:07:06 mail1 heartbeat: [4553]: info: New standby state: 0
Jul 23 16:07:07 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:07 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:07 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:07 mail1 heartbeat: [4553]: info: other_holds_resources: 3
Jul 23 16:07:09 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:09 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:09 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:11 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:11 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:11 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:13 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:13 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:13 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:15 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:15 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:15 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:17 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:17 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:17 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:19 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:19 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:19 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted
Jul 23 16:07:21 mail1 heartbeat: [4565]: ERROR: glib: Error sending packet: Operation not permitted
Jul 23 16:07:21 mail1 heartbeat: [4565]: info: glib: euid=0 egid=0
Jul 23 16:07:21 mail1 heartbeat: [4565]: ERROR: write_child: write failure on ping 10.9.9.14.: Operation not permitted

Attachment: messages.mail1.linkdown
Description: Binary data

Jul 23 16:04:02 mail2 heartbeat: [12163]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Jul 23 16:04:02 mail2 heartbeat: [12163]: info: other_holds_resources: 1
Jul 23 16:04:02 mail2 heartbeat: [12163]: info: other_holds_resources: 1
Jul 23 16:07:00 mail2 ipfail: [12180]: info: Telling other node that we have more visible ping nodes.
Jul 23 16:07:05 mail2 heartbeat: [12163]: info: mail1.example.com wants to go standby [all]
Jul 23 16:07:05 mail2 heartbeat: [12163]: info: standby: other_holds_resources: 1
Jul 23 16:07:05 mail2 heartbeat: [12163]: info: New standby state: 2
Jul 23 16:07:05 mail2 heartbeat: [12163]: info: New standby state: 2
Jul 23 16:07:05 mail2 heartbeat: [12163]: WARN: Standby timer has 3600000 ms left
Jul 23 16:07:05 mail2 kernel: drbd0: peer( Primary -> Secondary ) 
Jul 23 16:07:06 mail2 heartbeat: [12163]: WARN: Standby timer has 3599490 ms left
Jul 23 16:07:06 mail2 heartbeat: [12163]: info: other_holds_resources: 0
Jul 23 16:07:06 mail2 heartbeat: [12163]: WARN: Standby timer has 3599490 ms left
Jul 23 16:07:06 mail2 heartbeat: [12163]: WARN: Standby timer has 3599490 ms left
Jul 23 16:07:06 mail2 heartbeat: [12163]: info: standby: acquire [all] resources from mail1.example.com
Jul 23 16:07:06 mail2 heartbeat: [12163]: info: New standby state: 3
Jul 23 16:07:06 mail2 heartbeat: [12232]: info: acquire all HA resources (standby).
Jul 23 16:07:06 mail2 heartbeat: [12232]: info: go_standby: who: 2 resource set: all
Jul 23 16:07:06 mail2 heartbeat: [12232]: info: go_standby: (query/action): (allkeys/takegroup)
Jul 23 16:07:06 mail2 ResourceManager[12245]: [12256]: info: Acquiring resource group: mail1.example.com IPaddr2::10.9.9.6/28/eth0/10.9.9.15 drbddisk::home
Jul 23 16:07:06 mail2 IPaddr2[12268]: [12325]: INFO:  Resource is stopped
Jul 23 16:07:06 mail2 ResourceManager[12245]: [12339]: info: Running /etc/ha.d/resource.d/IPaddr2 10.9.9.6/28/eth0/10.9.9.15 start
Jul 23 16:07:06 mail2 IPaddr2[12370]: [12405]: INFO: ip -f inet addr add 10.9.9.6/28 brd 10.9.9.15 dev eth0
Jul 23 16:07:06 mail2 IPaddr2[12370]: [12407]: INFO: ip link set eth0 up
Jul 23 16:07:06 mail2 IPaddr2[12370]: [12409]: INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.9.9.6 eth0 10.9.9.6 auto not_used not_used
Jul 23 16:07:06 mail2 IPaddr2[12341]: [12413]: INFO:  Success
Jul 23 16:07:06 mail2 ResourceManager[12245]: [12443]: info: Running /etc/ha.d/resource.d/drbddisk home start
Jul 23 16:07:06 mail2 kernel: drbd0: role( Secondary -> Primary ) 
Jul 23 16:07:06 mail2 heartbeat: [12232]: info: all HA resource acquisition completed (standby).
Jul 23 16:07:06 mail2 heartbeat: [12232]: info: Writing type [ask_resources] message to FIFO

Attachment: messages.mail2.linkdown
Description: Binary data

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to