On 2009-11-16T11:12:51, Tobias Appel <[email protected]> wrote:
> Hi,
>
> well Nagios informed me today that the root partition of my Heartbeat
> Cluster is getting full. After a short investigation I found out that
> this directory has over 2 GB of size:
>
> /var/lib/heartbeat/cores/root/
>
> Over 250 of those files were in there:
>
> -rw------- 1 root root 8228864 Nov 16 11:08 core.8251
Yes, you should worry a lot. Look at the gdb backtrace and the logs to
see why this happens.
> Heartbeat runs fine and stable though. I know that one of the two
> Ethernet Interfaces I use for hb (eth1 and eth3) crashes a lot due to a
> driver error (problem with SUN / NVIDIA and RedHat, no fix yet) and I
> suppose that's why there is a core dump - because Heartbeat knows that
> the link is down.
Don't configure the interfaces to go down on link state change, set them
to always up. The cluster won't recover cleanly otherwise.
> Also those core dumps happen only on the active node in our two-node
> cluster. None are on the passive node.
That is pretty bad. Investigate and fix.
Regards,
Lars
--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems