Hi, On Tue, Jan 04, 2011 at 07:47:10AM -0600, Igor Chudov wrote: > Further reading indicates that heartbeat itself sets a limit for itself > every so often.
True. > Then it exceeds the limit (probably due to a bug). I am sure that tha's why > whoever wrote heartbeat, set cpu limit, instead of foxing their bugs. Wrong conclusion. > Then it dies with SIGXCPU, leaving everything in an extremely messy state, > leading to split brain, destruction of shared resources (DRBD data). > > I was trying to be a little patient. A little forgiving. I must say that my > patience is rapidly running out. > > I absolutely cannot use this "solution" as a basis of a high reliability > cluster, because it is the opposite of reliability. > > We had an old cluster that works very well with heartbeat V1. But it is > getting old, the disks are wearing out, the fans are not getting newer, etc. > I set up a new cluster in summer, but never fully trusted it, and it looks > like I will not be able to trust it. We never completed a switchover. You can open a bugzilla with logs, though Heartbeat is actually better supported by paid contracts these days. > At this point I feel rather desperate. Perhaps I should give "pacemaker" > another go. I really have no idea and I am running out of options. Well, with v1 configuration you're more or less on your own. Though I'm sure that there are still quite a few of v1 clusters running. Thanks, Dejan > i > > On Tue, Jan 4, 2011 at 7:32 AM, Igor Chudov <[email protected]> wrote: > > > A few weeks I reported that heartbeat died on one of the cluster machines, > > due to SIGXCPU. > > > > Well, it happened again. Heartbeat died, now both machines had the shared > > IP address up, what a god awful mess!!! > > > > Nopw they have split brain and the whole nine yards! > > > > I looked at /proc/<heartbeat_pid>/limits and found: > > > > Limit Soft Limit Hard Limit Units > > > > Max cpu time 43 unlimited seconds > > > > > > So, this process somehow has a limit set for it. > > > > Does anyone have ANY clue who would set a limit for this process??? WTF? > > Does it do it for itself or what? > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
