Christoph Lechner wrote:
> Hi all,
> 
> DRBD failover isn't working for me :(
> 
> I'm running Heartbeat 2.1.4 with CRM enabled in an active/active setup.
> dopd is enabled as is the drbd-peer-outdater in the drbd configuration.
> 
> All the resources depending on DRBD are located on the host running as
> DRBD master. If I kill the virtual machine running the DRBD master host,
> I'm expecting the other machine to take over after some time. But
> nothing happens, only some log messages looping over and over again, 2
> seconds between the block of messages down below popping up again in the
> syslog:
> 
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: WARN: Cluster node: 
>> rt2: status: dead
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: outdater: no 
>> message this time
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Processed 1 
>> messages
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: destroying 
>> connection: (null)
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Deleting 
>> outdater (0x8cabc88) from mainloop
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Connecting 
>> channel
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Client outdater 
>> (0x8cabc88) connected
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: invoked: outdater
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Processing msg 
>> from outdater
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Got message from 
>> (drbd-peer-outdater). (peer: rt2, res :r0)
>> Sep  7 19:51:11 rt1 /usr/lib/heartbeat/dopd: [1998]: debug: Starting node 
>> walk
>> Sep  7 19:51:13 rt1 kernel: [ 3706.060533] drbd0: helper command: 
>> /sbin/drbdadm outdate-peer minor-0 exit code 20 (0x1400)
>> Sep  7 19:51:13 rt1 kernel: [ 3706.060538] drbd0: outdate-peer helper 
>> broken, returned 20
>> Sep  7 19:51:13 rt1 kernel: [ 3706.060892] drbd0: helper command: 
>> /sbin/drbdadm outdate-peer minor-0
>> Sep  7 19:51:13 rt1 drbd-peer-outdater: [7205]: debug: message: outdater_rc, 
>> rt1
>> Sep  7 19:51:13 rt1 drbd-peer-outdater: [7209]: debug: drbd peer: rt2
>> Sep  7 19:51:13 rt1 drbd-peer-outdater: [7209]: debug: drbd resource: r0
> 
> 30 minutes now since I killed the master host.
> 
> Where's my fault?
The first fault that I've found was that I wasn't running Heartbeat
2.1.4 but still running 2.1.3 . I forgot to install the newly-built deb
packages. I'm somewhat ashamed of mixing up the installed version number.

The post
> http://www.nabble.com/Re%3A--PATCH--dopd-should-notify-when-peer-is-dead-%28was-%22Refusing-to-be-Primary-while-peer-is-not-outdated%22-when-peer-is-dead-%29-p15738134.html
indicated that there was something wrong with the dopd in Heartbeat
2.1.3 an I lookup up the version number in hb_gui ...

Now failover works, but ocfs2 doesn't work anymore. Have to debug it
tomorrow. It appears that the Filesystem OCF script in 2.1.3 mounts the
file system, but the one shipped along with 2.1.4 doesn't. The reason
seams to be that in Filesystem_notify the creating of a symlink in the
o2cb config FS fails. It's in line 563, for the curious.
The failing line of code is:

if ! ln -s $OCFS2_CLUSTER_ROOT/node/$entry $OCFS2_FS_ROOT/$entry ; then

My OCFS2 Kernel code has version 1.5.0, it's part of Debian stable (=5.0).

 - cl
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to