Hi,

On Tue, Jan 04, 2011 at 07:47:10AM -0600, Igor Chudov wrote:
> Further reading indicates that heartbeat itself sets a limit for itself
> every so often.

True.

> Then it exceeds the limit (probably due to a bug). I am sure that tha's why
> whoever wrote heartbeat, set cpu limit, instead of foxing their bugs.

Wrong conclusion.

> Then it dies with SIGXCPU, leaving everything in an extremely messy state,
> leading to split brain, destruction of shared resources (DRBD data).
> 
> I was trying to be a little patient. A little forgiving. I must say that my
> patience is rapidly running out.
> 
> I absolutely cannot use this "solution" as a basis of a high reliability
> cluster, because it is the opposite of reliability.
> 
> We had an old cluster that works very well with heartbeat V1. But it is
> getting old, the disks are wearing out, the fans are not getting newer, etc.
> I set up a new cluster in summer, but never fully trusted it, and it looks
> like I will not be able to trust it. We never completed a switchover.

You can open a bugzilla with logs, though Heartbeat is actually
better supported by paid contracts these days.

> At this point I feel rather desperate. Perhaps I should give "pacemaker"
> another go. I really have no idea and I am running out of options.

Well, with v1 configuration you're more or less on your own.
Though I'm sure that there are still quite a few of v1 clusters
running.

Thanks,

Dejan


> i
> 
> On Tue, Jan 4, 2011 at 7:32 AM, Igor Chudov <[email protected]> wrote:
> 
> > A few weeks I reported that heartbeat died on one of the cluster machines,
> > due to SIGXCPU.
> >
> > Well, it happened again. Heartbeat died, now both machines had the shared
> > IP address up, what a god awful mess!!!
> >
> > Nopw they have split brain and the whole nine yards!
> >
> > I  looked at /proc/<heartbeat_pid>/limits and found:
> >
> > Limit                     Soft Limit           Hard Limit           Units
> >
> > Max cpu time              43                   unlimited            seconds
> >
> >
> > So, this process somehow has a limit set for it.
> >
> > Does anyone have ANY clue who would set a limit for this process??? WTF?
> > Does it do it for itself or what?
> >
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to