Hi,

On Mon, Aug 05, 2013 at 03:55:18PM +0200, Marcus Bointon wrote:
> I have two nodes running heartbeat 3.0.5 and pacemaker 1.1.6 (both from the 
> linux-ha lucid ppa). They are running 11 groups each comprising an 
> ocf:heartbeat:IPaddr2, an ocf:heartbeat:SendArp and an ocf:heartbeat:MailTo.
> 
> There is also a mailto resource configured for the overall cluster.
> 
> Despite all these, all the notifications I ever receive look identical:
> 
>     Heartbeat status change: Migrating resource away at Mon Aug  5 13:01:49 
> UTC 2013 from proxy2
>     Command line was:
>     /usr/lib/ocf/resource.d//heartbeat/MailTo stop
> 
> One major omission here is that it doesn't tell me which resource it migrated.
> 
> Is there some way of configuring the cluster itself to send notifications so 
> that I can remove the individual mailto resources?
> 
> Coincidentally (?), I've just started to get this problem:
> 
> Aug  5 11:13:50 proxy1 heartbeat: [2958]: ERROR: glib: ucast_write: Unable to 
> send HBcomm packet eth0 192.168.1.10:694 len=78903 [-1]: Message too long
> Aug  5 11:13:50 proxy1 heartbeat: [2958]: ERROR: write_child: write failure 
> on ucast eth0.: Message too long
> 
> This (well at least I assume it's this) is resulting in resources 
> disappearing, randomly starting and stopping, flip-flopping between nodes, 
> marking nodes as offline and more fun things to keep us awake at night.
> 
> The only explanation I've found for this is here 
> http://comments.gmane.org/gmane.linux.highavailability.pacemaker/10765
> The solutions suggested are to alter compression settings (which we were not 
> using before),

Which compression setting do you have now? I think you should
try with different compression settings as suggested there by Lars.

> migrate to corosync and/or to make the cib smaller, hence the idea of 
> removing the individual mailtos.
> 
> I've run hb_report and that doesn't say anything useful, more or less "it 
> doesn't work".
> 
> I'd like to migrate to corosync if it's better, but I'm extremely wary of 
> touching anything in the cluster.

Do you have a test cluster? If not, you can create one easily
with VMs.

Thanks,

Dejan

> Marcus
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to