A few weeks I reported that heartbeat died on one of the cluster machines, due to SIGXCPU.
Well, it happened again. Heartbeat died, now both machines had the shared IP address up, what a god awful mess!!! Nopw they have split brain and the whole nine yards! I looked at /proc/<heartbeat_pid>/limits and found: Limit Soft Limit Hard Limit Units Max cpu time 43 unlimited seconds So, this process somehow has a limit set for it. Does anyone have ANY clue who would set a limit for this process??? WTF? Does it do it for itself or what? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
