Hi, Update:
The runlevel thing was not a solution (I assumed kind of a racing condition and thought the deferred start of heartbeat would solve this) but it happened again. I know have the suspicion that it is caused by the missing hb_generation file on the freshly installed server. After re-reading the docs I was wondering why I didn't run into a reply attack protection anyway ?? I tried now with "hbgenmethod time" and my reinstall-procedure now succeeded without coredump (at least one try, I'm getting tired of doing this installation thing all the time again...). Still I have a little concern that a new joining node, whether legal or not and whether it behaves nice or not, can cause my running heartbeat to fail in this dramatic way... Regards, Bernhard -------- Original-Nachricht -------- Datum: Fri, 11 May 2007 13:07:28 +0200 Von: "Bernhard Limbach" <[EMAIL PROTECTED]> An: [email protected] Betreff: [Linux-HA] Coredump on active node when other node joins in > Hi, > > I'm currently practicing the reinstallation of one cluster node > (maintenance procedure to replace a server), while the other node is running > and > providing the services. > > When the freshly installed node comes up, heartbeat on the primary node > dumps core and does an emergency shutdown. > > Freshly installed means that in addition to the config files the only file > in /var/lib/heartbeat and below, that I have restored, is the file > hb_uuid. Everything else there should be automatically updated, as far as I > have > understood the concepts... > > The error happened (reproducably) when heartbeat was started in runlevel > 2. > > When started in runlevel 5 it did not happen (that's now my current > workaround). > > > The error also did not happen when one of the nodes was rebootet normally, > i.e. after it has been online in the cluster already. > > > The setup is a simple 2-node cluster with: > - heartbeat-2.0.8 compiled from the tarball that is available on the > download page. > - Fedora Core 5 with kernel 2.6.20-1.2316.fc5smp > > > Attached you will find: ha.cf, cib.xml, the logs of both nodes and the > backtrace of the core-dump (if I managed to extract it correctly...). > > Please note also that after the emergency shutdown two heartbeat processes > still were running: > > DMM1:/root # ps -ef |grep heartbeat > root 17535 1 0 07:31 ? 00:00:00 /usr/lib/heartbeat/lrmd -r > 17 17537 1 0 07:31 ? 00:00:00 /usr/lib/heartbeat/attrd > > > As starting of a freshly installed server in runlevel 5 is a workable > workaround for me I merely wanted to inform you about this error, maybe it > helps to track down another of those little bugs... > > Best regards, > Bernhard > -- > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. > Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
