Hi,
after getting the port ready to compile and install smoothly on OpenBSD,
meanwhile also tested on sparc ;)
I recognized that I have problems to build a cluster with more than two
nodes. First I decided to use broadcast, with a ha.cf file like this:
autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 biggame.ds9 warbird.ds9
udpport 6666
bcast le0
ping 10.0.0.1 10.11.0.1
debug true
cluster OpenBSD-Heartbeat
I had a three node cluster configured, one i386, one sparc, one sparc64, all
nodes joined in the cluster, one became DC, and got quorum, etc. Also
marking nodes as active/standby did work.
Then I tried to add resources, via the GUI, that more or less immediately
rendered the cluster unusable, when the DC tries to send out the cib updates
to the other nodes:
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: write failure on bcast
fxp0.: Message too long
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: glib: Unable to send
bcast [-1] packet(len=1543): Message too long
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG: Dumping message with
22 fields
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[0] : [t=cib]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[1] :
[cib_op=cib_replace]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[2] :
[cib_delegated_from=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[3] :
[cib_clientname=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[4] :
[cib_isreplyto=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[5] :
[original_cib_op=cib_sync_one]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[6] :
[cib_update=true]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[7] :
[(4)cib_calldata=0x806e6010(1345 1011)]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[8] :
[dest=biggame.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[9] : [oseq=1]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[10] : [from_id=cib]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[11] : [to_id=cib]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[12] : [client_gen=4]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[13] :
[(1)destuuid=0x830be490(37 28)]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[14] :
[src=defiant.ds9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[15] :
[(1)srcuuid=0x7d11bd90(36 27)]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[16] : [seq=c9]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[17] : [hg=4718cba2]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[18] : [ts=4718ccab]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[19] : [ld=n/a]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[20] : [ttl=6]
Oct 19 17:27:01 defiant heartbeat: [10097]: ERROR: MSG[21] :
[_compression_algorithm=bz2]
ok, then I tried to use multicast, like in this configuration:
autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 biggame.ds9 warbird.ds9
udpport 6666
#mcast fxp0 224.0.0.1 609 2 0
ping 10.0.0.1 10.11.0.1
debug true
cluster OpenBSD-Heartbeat
but with that configuration, I saw that in the logfiles:
Oct 19 17:31:11 defiant heartbeat: [23845]: ERROR: write failure on mcast
fxp0.: Host is down
Oct 19 17:31:13 defiant heartbeat: [23845]: ERROR: glib: Unable to send
mcast packet [-1]: Host is down
Oct 19 17:31:13 defiant heartbeat: [23845]: ERROR: write failure on mcast
fxp0.: Host is down
Oct 19 17:31:15 defiant heartbeat: [23845]: ERROR: glib: Unable to send
mcast packet [-1]: Host is down
Oct 19 17:31:15 defiant heartbeat: [23845]: ERROR: write failure on mcast
fxp0.: Host is down
Oct 19 17:31:17 defiant heartbeat: [23845]: ERROR: glib: Unable to send
mcast packet [-1]: Host is down
Oct 19 17:31:17 defiant heartbeat: [23845]: ERROR: write failure on mcast
fxp0.: Host is down
The interface has the MULTICAST flag set:
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:0e:7b:fc:c0:a0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet6 fe80::20e:7bff:fefc:c0a0%fxp0 prefixlen 64 scopeid 0x2
inet 10.0.0.5 netmask 0xffffff00 broadcast 10.0.0.255
ok, then the next try was to use unicast like in this configuration:
autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 biggame.ds9 warbird.ds9
udpport 6666
ucast fxp0 defiant.ds9
udpport 6667
ucast fxp0 warbird.ds9
udpport 6668
ucast fxp0 biggame.ds9
ping 10.0.0.1 10.11.0.1
debug true
cluster OpenBSD-Heartbeat
but starting up with that configuration, I saw that in the logs:
Oct 19 17:35:51 defiant heartbeat: [24061]: WARN: Realtime scheduling not
supported on this platform.
Oct 19 17:35:51 defiant heartbeat: [347]: ERROR: glib: ucast: error binding
socket. Retrying: Address already in use
Oct 19 17:36:00 defiant last message repeated 9 times
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Emergency Shutdown: Master
Control process died.
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Killing pid 347 with
SIGTERM
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Killing pid 14063 with
SIGTERM
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Killing pid 4586 with
SIGTERM
Oct 19 17:36:02 defiant heartbeat: [29268]: CRIT: Emergency Shutdown(MCP
dead): Killing ourselves.
Multicast and broadcast work well with that configuration on openSUSE 10.2.
I tried the unicast configuration too, but it seems heartbeat only uses one
udpport statement for all ucast lines, is that as intended, or shouldn't it
use different ports as configured above?
I think these problems are OpenBSD specific, I am just wondering whether
sth. like this also happens on FreeBSD or Solaris?
Well, a unicast cluster between two OpenBSD nodes doesn't seem to have that
problem. resources can be created, they migrate between the nodes, etc.
kind regards
Sebastian
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/