(resending with a proper subject, sorry - i'm using digest mode) Thanks for the prompt reply. See comments below. I must stress that I'm in a lot of pressure about this, and it's pretty critical to solve these issues ASAP. So sorry if I'm a bit paniced :S
On Fri, Sep 5, 2008 at 9:00 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > On Thu, Sep 04, 2008 at 04:57:39PM -0400, Itay Donenhirsch wrote: > > Hi all, > > I've got a serious crash in heartbeat. > > The scenario is that I start with 4 stations: ibp1 ibp2 ibp3 and > > ibp-standby. > > I get all hosts connected and online. > > I then shutdown the network switch and brings back up. > > The some of the stations (many times all of them) keep rebooting with: > > heartbeat: [5013]: EMERG: Rebooting system. Reason: > > /usr/lib64/heartbeat/crmd > > That's a recovery measure. > crmd is crashing or being killed. Perhaps you should upgrade. > I investigated it further and it seems that the heartbeat process kills all the other processes. It happens when all the nodes are getting back up, and then I see this message in the log: Sep 4 22:56:46 [EMAIL PROTECTED] heartbeat: [3289]: ERROR: Cannot write to media pipe 0: Resource temporarily unavailable That looked very weird to me, and I found about http://developerbugs.linux-foundation.org/show_bug.cgi?id=1697 I've seen in the code that this path wasn't applied in 2.1.4 and tried to put it in, but it didn't solve anything. Not sure I did that right, as the code is a bit changed since then. Another thing I noticed is that /var/heartbeat/pengine fills up (1000s of files). About the upgrade - i'm already at 2.1.4 (sorry for not mentioning it before). > > > This keeps going in a loop untill I stop heartbeat before it reboots > again. > > Replace crm yes with crm respawn in ha.cf until you fix it. > Wont that just cause the CRM to restart in a loop? > > > Please help, I really don't know what to do. > > Attached is the end of the log file of station ibp3. Some EMERGs are > visible > > there. > > > > Thanks, > > Itay > > Thanks, > > Dejan > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
