On 2009-11-16T11:12:51, Tobias Appel <[email protected]> wrote:

> Hi,
> 
> well Nagios informed me today that the root partition of my Heartbeat 
> Cluster is getting full. After a short investigation I found out that 
> this directory has over 2 GB of size:
> 
> /var/lib/heartbeat/cores/root/
> 
> Over 250 of those files were in there:
> 
> -rw-------  1 root   root   8228864 Nov 16 11:08 core.8251

Yes, you should worry a lot. Look at the gdb backtrace and the logs to
see why this happens.

> Heartbeat runs fine and stable though. I know that one of the two 
> Ethernet Interfaces I use for hb (eth1 and eth3) crashes a lot due to a 
> driver error (problem with SUN / NVIDIA and RedHat, no fix yet) and I 
> suppose that's why there is a core dump - because Heartbeat knows that 
> the link is down.

Don't configure the interfaces to go down on link state change, set them
to always up. The cluster won't recover cleanly otherwise.


> Also those core dumps happen only on the active node in our two-node 
> cluster. None are on the passive node.

That is pretty bad. Investigate and fix.


Regards,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to