Hello, I have tried to build up a four nodes cluster with heartbeat and pacemaker. Everything is alright as long as the cluster consists of 2 nodes. With 3 or 4 nodes suddenly error messages come up during a configuration change:
Sep 06 08:16:56 secomat4 heartbeat: [15956]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15954]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15952]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15952]: ERROR: write_child: write failure on ucast hb1.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15954]: ERROR: write_child: write failure on ucast hb1.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15956]: ERROR: write_child: write failure on ucast hb1.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15958]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15958]: ERROR: write_child: write failure on ucast hb1.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15960]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15960]: ERROR: write_child: write failure on ucast hb2.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15966]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15962]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15964]: ERROR: glib: Unable to send [-1] ucast packet: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15962]: ERROR: write_child: write failure on ucast hb2.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15966]: ERROR: write_child: write failure on ucast hb2.: Message too long Sep 06 08:16:56 secomat4 heartbeat: [15964]: ERROR: write_child: write failure on ucast hb2.: Message too long ... The result is a loss of nodes' intra cluster connection. This seems to be independent of cluster communication protocol. The example error mesages show up with ucast (currently I use bcast again, see ha.cf). I have seen the same error messages with bcast (bcast with compression didn't work, too). With mcast it was slightly different: I couldn't get a single node up and running, so i had to switch back to bcast. Some Information about the setup: ha.cf: use_logd on udpport 694 keepalive 2 warntime 10 deadtime 15 initdead 90 bcast hb1 hb2 autojoin none node secomat1 secomat2 secomat3 secomat4 # debug 1 crm yes apiauth stonith-ng uid=root crm: pacemaker RPMs: secomat4:~ # rpm -qi heartbeat Name : heartbeat Relocations: (not relocatable) Version : 3.0.3 Vendor: (none) Release : 2.18 Build Date: Wed Sep 29 18:06:13 2010 Install Date: Wed Apr 13 15:25:12 2011 Build Host: f13.beekhof.net Group : Productivity/Clustering/HA Source RPM: heartbeat-3.0.3-2.18.src.rpm Size : 10221126 License: GPL v2 only; LGPL v2.1 or later Signature : (none) URL : http://linux-ha.org/ Summary : Messaging and membership subsystem for High-Availability Linux Description : ... secomat4:~ # rpm -qi pacemaker Name : pacemaker Relocations: (not relocatable) Version : 1.1.5 Vendor: (none) Release : 1.1 Build Date: Mon Feb 14 17:34:14 2011 Install Date: Wed Apr 13 15:25:15 2011 Build Host: f13.beekhof.net Group : Productivity/Clustering/HA Source RPM: pacemaker-1.1.5-1.1.src.rpm Size : 13626153 License: GPLv2+ and LGPLv2+ Signature : (none) URL : http://www.clusterlabs.org Summary : Scalable High-Availability cluster resource manager Description : ... Hardware: 2 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, 2660 MHz 6 cores 24 processors OS: secomat4:~ # uname -a Linux secomat4 2.6.34.10-0.2-default #1 SMP 2011-07-20 18:48:56 +0200 x86_64 x86_64 x86_64 GNU/Linux Best regards Claus ___________________________________________________________ Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die Toolbar eingebaut! http://produkte.web.de/go/toolbar _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
