Re: [Linux-HA] Coredump on active node when other node joins in

Bernhard Limbach Mon, 14 May 2007 08:45:34 -0700

Hi,

Update:


The runlevel thing was not a solution (I assumed kind of a racing condition and 
thought the deferred start of heartbeat would solve this) but it happened again.

I know have the suspicion that it is caused by the missing hb_generation file 
on the freshly installed server. After re-reading the docs I was wondering why 
I didn't run into a reply attack protection anyway ??

I tried now with "hbgenmethod time" and my reinstall-procedure now succeeded 
without coredump (at least one try, I'm getting tired of doing this 
installation thing all the time again...).

Still I have a little concern that a new joining node, whether legal or not and 
whether it behaves nice or not, can cause my running heartbeat to fail in this 
dramatic way...

Regards,
Bernhard



-------- Original-Nachricht --------
Datum: Fri, 11 May 2007 13:07:28 +0200
Von: "Bernhard Limbach" <[EMAIL PROTECTED]>
An: [email protected]
Betreff: [Linux-HA] Coredump on active node when other node joins in

> Hi,
> 
> I'm currently practicing the reinstallation of one cluster node
> (maintenance procedure to replace a server), while the other node is running 
> and
> providing the services.
> 
> When the freshly installed node comes up, heartbeat on the primary node
> dumps core and does an emergency shutdown.
> 
> Freshly installed means that in addition to the config files the only file
> in /var/lib/heartbeat and below, that I have restored, is the file
> hb_uuid. Everything else there should be automatically updated, as far as I 
> have
> understood the concepts...
> 
> The error happened (reproducably) when heartbeat was started in runlevel
> 2.
> 
> When started in runlevel 5 it did not happen (that's now my current
> workaround).
> 
> 
> The error also did not happen when one of the nodes was rebootet normally,
> i.e. after it has been online in the cluster already.
> 
> 
> The setup is a simple 2-node cluster with:
> - heartbeat-2.0.8 compiled from the tarball that is available on the
> download page.
> - Fedora Core 5 with kernel 2.6.20-1.2316.fc5smp
> 
> 
> Attached you will find: ha.cf, cib.xml, the logs of both nodes and the
> backtrace of the core-dump (if I managed to extract it correctly...).
> 
> Please note also that after the emergency shutdown two heartbeat processes
> still were running:
> 
> DMM1:/root # ps -ef |grep heartbeat
> root     17535     1  0 07:31 ?        00:00:00 /usr/lib/heartbeat/lrmd -r
> 17       17537     1  0 07:31 ?        00:00:00 /usr/lib/heartbeat/attrd
> 
> 
> As starting of a freshly installed server in runlevel 5 is a workable
> workaround for me I merely wanted to inform you about this error, maybe it
> helps to track down another of those little bugs...
> 
> Best regards,
> Bernhard
> -- 
> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Coredump on active node when other node joins in

Reply via email to