[Linux-ha-dev] problems to setup a cluster with more than two nodes on OpenBSD

Sebastian Reitenbach Fri, 19 Oct 2007 08:57:13 -0700

Hi,

after getting the port ready to compile and install smoothly on OpenBSD, 
meanwhile also tested on sparc ;)
I recognized that I have problems to build a cluster with more than two 
nodes. First I decided to use broadcast, with a ha.cf file like this:


autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 biggame.ds9 warbird.ds9
udpport 6666
bcast le0
ping 10.0.0.1 10.11.0.1
debug true
cluster OpenBSD-Heartbeat

I had a three node cluster configured, one i386, one sparc, one sparc64, all 
nodes joined in the cluster, one became DC, and got quorum, etc. Also 
marking nodes as active/standby did work.
Then I tried to add resources, via the GUI, that more or less immediately 
rendered the cluster unusable, when the DC tries to send out the cib updates 
to the other nodes:

Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: write failure on bcast 
fxp0.: Message too long
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: glib: Unable to send 
bcast [-1] packet(len=1543): Message too long
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG: Dumping message with 
22 fields
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[0] : [t=cib]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[1] : 
[cib_op=cib_replace]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[2] : 
[cib_delegated_from=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[3] : 
[cib_clientname=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[4] : 
[cib_isreplyto=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[5] : 
[original_cib_op=cib_sync_one]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[6] : 
[cib_update=true]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[7] : 
[(4)cib_calldata=0x806e6010(1345 1011)]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[8] : 
[dest=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[9] : [oseq=1]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[10] : [from_id=cib]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[11] : [to_id=cib]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[12] : [client_gen=4]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[13] : 
[(1)destuuid=0x830be490(37 28)]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[14] : 
[src=defiant.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[15] : 
[(1)srcuuid=0x7d11bd90(36 27)]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[16] : [seq=c9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[17] : [hg=4718cba2]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[18] : [ts=4718ccab]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[19] : [ld=n/a]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[20] : [ttl=6]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[21] : 
[_compression_algorithm=bz2]

ok, then I tried to use multicast, like in this configuration:
autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 biggame.ds9 warbird.ds9
udpport 6666
#mcast fxp0 224.0.0.1 609 2 0
ping 10.0.0.1 10.11.0.1
debug true
cluster OpenBSD-Heartbeat

but with that configuration, I saw that in the logfiles:
Oct 19 17:31:11 defiant heartbeat: [23845]: ERROR: write failure on mcast 
fxp0.: Host is down
Oct 19 17:31:13 defiant heartbeat: [23845]: ERROR: glib: Unable to send 
mcast packet [-1]: Host is down
Oct 19 17:31:13 defiant heartbeat: [23845]: ERROR: write failure on mcast 
fxp0.: Host is down
Oct 19 17:31:15 defiant heartbeat: [23845]: ERROR: glib: Unable to send 
mcast packet [-1]: Host is down
Oct 19 17:31:15 defiant heartbeat: [23845]: ERROR: write failure on mcast 
fxp0.: Host is down
Oct 19 17:31:17 defiant heartbeat: [23845]: ERROR: glib: Unable to send 
mcast packet [-1]: Host is down
Oct 19 17:31:17 defiant heartbeat: [23845]: ERROR: write failure on mcast 
fxp0.: Host is down

The interface has the MULTICAST flag set:
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0e:7b:fc:c0:a0
        groups: egress
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet6 fe80::20e:7bff:fefc:c0a0%fxp0 prefixlen 64 scopeid 0x2
        inet 10.0.0.5 netmask 0xffffff00 broadcast 10.0.0.255


ok, then the next try was to use unicast like in this configuration:

autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 biggame.ds9 warbird.ds9
udpport 6666
ucast fxp0 defiant.ds9
udpport 6667
ucast fxp0 warbird.ds9
udpport 6668
ucast fxp0 biggame.ds9
ping 10.0.0.1 10.11.0.1
debug true
cluster OpenBSD-Heartbeat

but starting up with that configuration, I saw that in the logs:

Oct 19 17:35:51 defiant heartbeat: [24061]: WARN: Realtime scheduling not 
supported on this platform.
Oct 19 17:35:51 defiant heartbeat: [347]: ERROR: glib: ucast: error binding 
socket. Retrying: Address already in use
Oct 19 17:36:00 defiant last message repeated 9 times
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Emergency Shutdown: Master 
Control process died.
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Killing pid 347 with 
SIGTERM
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Killing pid 14063 with 
SIGTERM
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Killing pid 4586 with 
SIGTERM
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Emergency Shutdown(MCP 
dead): Killing ourselves.

Multicast and broadcast work well with that configuration on openSUSE 10.2. 
I tried the unicast configuration too, but it seems heartbeat only uses one 
udpport statement for all ucast lines, is that as intended, or shouldn't it 
use different ports as configured above?

I think these problems are OpenBSD specific, I am just wondering whether 
sth. like this also happens on FreeBSD or Solaris?

Well, a unicast cluster between two OpenBSD nodes doesn't seem to have that 
problem. resources can be created, they migrate between the nodes, etc.

kind regards
Sebastian

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] problems to setup a cluster with more than two nodes on OpenBSD

Reply via email to