Hi, On Mon, Aug 05, 2013 at 03:55:18PM +0200, Marcus Bointon wrote: > I have two nodes running heartbeat 3.0.5 and pacemaker 1.1.6 (both from the > linux-ha lucid ppa). They are running 11 groups each comprising an > ocf:heartbeat:IPaddr2, an ocf:heartbeat:SendArp and an ocf:heartbeat:MailTo. > > There is also a mailto resource configured for the overall cluster. > > Despite all these, all the notifications I ever receive look identical: > > Heartbeat status change: Migrating resource away at Mon Aug 5 13:01:49 > UTC 2013 from proxy2 > Command line was: > /usr/lib/ocf/resource.d//heartbeat/MailTo stop > > One major omission here is that it doesn't tell me which resource it migrated. > > Is there some way of configuring the cluster itself to send notifications so > that I can remove the individual mailto resources? > > Coincidentally (?), I've just started to get this problem: > > Aug 5 11:13:50 proxy1 heartbeat: [2958]: ERROR: glib: ucast_write: Unable to > send HBcomm packet eth0 192.168.1.10:694 len=78903 [-1]: Message too long > Aug 5 11:13:50 proxy1 heartbeat: [2958]: ERROR: write_child: write failure > on ucast eth0.: Message too long > > This (well at least I assume it's this) is resulting in resources > disappearing, randomly starting and stopping, flip-flopping between nodes, > marking nodes as offline and more fun things to keep us awake at night. > > The only explanation I've found for this is here > http://comments.gmane.org/gmane.linux.highavailability.pacemaker/10765 > The solutions suggested are to alter compression settings (which we were not > using before),
Which compression setting do you have now? I think you should try with different compression settings as suggested there by Lars. > migrate to corosync and/or to make the cib smaller, hence the idea of > removing the individual mailtos. > > I've run hb_report and that doesn't say anything useful, more or less "it > doesn't work". > > I'd like to migrate to corosync if it's better, but I'm extremely wary of > touching anything in the cluster. Do you have a test cluster? If not, you can create one easily with VMs. Thanks, Dejan > Marcus > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
