Hello,

I have tried to build up a four nodes cluster with heartbeat and pacemaker. 
Everything is alright as long as the cluster consists of 2 nodes. With 3 or 4 
nodes suddenly error messages come up during a configuration change:

Sep 06 08:16:56 secomat4 heartbeat: [15956]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15954]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15952]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15952]: ERROR: write_child: write failure 
on ucast hb1.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15954]: ERROR: write_child: write failure 
on ucast hb1.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15956]: ERROR: write_child: write failure 
on ucast hb1.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15958]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15958]: ERROR: write_child: write failure 
on ucast hb1.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15960]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15960]: ERROR: write_child: write failure 
on ucast hb2.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15966]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15962]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15964]: ERROR: glib: Unable to send [-1] 
ucast packet: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15962]: ERROR: write_child: write failure 
on ucast hb2.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15966]: ERROR: write_child: write failure 
on ucast hb2.: Message too long
Sep 06 08:16:56 secomat4 heartbeat: [15964]: ERROR: write_child: write failure 
on ucast hb2.: Message too long
...

The result is a loss of nodes' intra cluster connection. This seems to be 
independent of cluster communication protocol. The example error mesages show 
up with ucast (currently I use bcast again, see ha.cf). I have seen the same 
error messages with bcast (bcast with compression didn't work, too). With mcast 
it was slightly different: I couldn't get a single node up and running, so i 
had to switch back to bcast.


Some Information about the setup:

ha.cf:
use_logd on
udpport 694
keepalive 2
warntime 10
deadtime 15
initdead 90
bcast hb1 hb2
autojoin none
node secomat1 secomat2 secomat3 secomat4
# debug 1
crm yes
apiauth stonith-ng      uid=root


crm:
pacemaker


RPMs:
secomat4:~ # rpm -qi heartbeat
Name        : heartbeat                    Relocations: (not relocatable)
Version     : 3.0.3                             Vendor: (none)
Release     : 2.18                          Build Date: Wed Sep 29 18:06:13 2010
Install Date: Wed Apr 13 15:25:12 2011         Build Host: f13.beekhof.net
Group       : Productivity/Clustering/HA    Source RPM: 
heartbeat-3.0.3-2.18.src.rpm
Size        : 10221126                         License: GPL v2 only; LGPL v2.1 
or later
Signature   : (none)
URL         : http://linux-ha.org/
Summary     : Messaging and membership subsystem for High-Availability Linux
Description : ...


secomat4:~ # rpm -qi pacemaker
Name        : pacemaker                    Relocations: (not relocatable)
Version     : 1.1.5                             Vendor: (none)
Release     : 1.1                           Build Date: Mon Feb 14 17:34:14 2011
Install Date: Wed Apr 13 15:25:15 2011         Build Host: f13.beekhof.net
Group       : Productivity/Clustering/HA    Source RPM: 
pacemaker-1.1.5-1.1.src.rpm
Size        : 13626153                         License: GPLv2+ and LGPLv2+
Signature   : (none)
URL         : http://www.clusterlabs.org
Summary     : Scalable High-Availability cluster resource manager
Description : ...


Hardware:
2  Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz, 2660 MHz
6  cores
24 processors


OS:
secomat4:~ # uname 
-a                                                                                                                                                                                                          
  
Linux secomat4 2.6.34.10-0.2-default #1 SMP 2011-07-20 18:48:56 +0200 x86_64 
x86_64 x86_64 GNU/Linux


Best regards
Claus
___________________________________________________________
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to