i've run into a very weird problem involving heartbeat, dopd, drbd,
and the linux bonding driver. first, my setup:
OS: centos 5.2 running kernel 2.6.18-92.1.10.el5PAE
heartbeat: 2.1.3 w/ dopd patch applied
drbd: 8.2.6 for installed kernel
the 'w/ dopd patch applied' refers to this patch that fixes a critical
breakage in dopd:
http://hg.linux-ha.org/dev/rev/47f60bebe7b2
note that the problem i describe below happens with or without the
patch.
basically, when i turn on dopd in my ha.cf file with the following
directives:
respawn hacluster /usr/lib/heartbeat/dopd
apiauth dopd gid=haclient uid=hacluster
the bonding setup that i have for my drbd connection tanks with
messages like these:
Aug 31 19:53:25 beast kernel: bonding: bond1: link status definitely
up for interface eth3.
Aug 31 19:53:25 beast kernel: bonding: bond1: making interface eth3
the new active one.
Aug 31 19:53:25 beast kernel: bonding: bond1: first active interface up!
Aug 31 19:53:25 beast kernel: bonding: bond1: link status definitely
up for interface eth4.
Aug 31 19:53:25 beast kernel: bonding: bond1: link status definitely
up for interface eth5.
Aug 31 20:18:27 beast kernel: bonding: bond1: link status definitely
down for interface eth5, disabling it
Aug 31 20:18:28 beast kernel: bonding: bond1: link status definitely
down for interface eth3, disabling it
Aug 31 20:18:28 beast kernel: bonding: bond1: link status definitely
down for interface eth4, disabling it
Aug 31 20:18:28 beast kernel: bonding: bond1: now running without any
active interface !
Aug 31 20:18:28 beast kernel: bonding: bond1: Error: found a client
with no channel in the client's hash table
the bond has three slave interfaces, using three crossover cables
that run directly from one machine to the other. the bond works
perfectly with heartbeat and drbd _until_ i add those directives
listed above to turn on dopd, at which point it tanks with those error
messages.
there's nothing special or interesting in the heartbeat logs that i
can see -- when dopd gets started by heartbeat then the problem occurs.
in the meantime i've simply gotten rid of the bond, and heartbeat/dopd/
drbd are all running happily together. i sure would like to have my
cake and eat it, too, though -- so if anybody has any suggestions
about how i can fix this, i'd surely appreciate it!
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems