I've been testing auto_failback in our 2.0.7-based lcuster, and
have found sometimes failback doesn't occur.
We're managing a virtual IP via a haresources file on a Red Hat 4
box.
What I tracked down was that if the box powered down too quickly
for heartbeat to clean up, a PID file was left in place:
# ls -ld /usr/local/var/run/heartbeat.pid
-rw-r----- 1 root root 11 May 22 16:44 /usr/local/var/run/heartbeat.pid
# cat /usr/local/var/run/heartbeat.pid
3215
But, when heartbeat tries to start after a reboot:
May 22 16:46:41 sqe-50 heartbeat: [3214]: WARN: Logging daemon
is disabled --enabling logging daemon is recommended
May 22 16:46:41 sqe-50 heartbeat: [3214]: info: **************************
May 22 16:46:41 sqe-50 heartbeat: [3214]: info: Configuration
validated. Starting heartbeat 2.0.7
May 22 16:46:41 sqe-50 heartbeat: [3214]: info: heartbeat: already
running [pid 3215].
What I see in make_daemon() is a check for this file, and it's contents:
/* See if heartbeat is already running... */
if ((pid=cl_read_pidfile(PIDFILE)) > 0 && pid != getpid()) {
cl_log(LOG_INFO, "%s: already running [pid %ld]."
, cmdname, pid);
exit(LSB_EXIT_OK);
}
But, there's no check to assure the recorded PID is not stale.
Have others seen this? This code seems to be in 2.0.8 as well...
--
Brian Reichert <[EMAIL PROTECTED]>
55 Crystal Ave. #286 Daytime number: (603) 434-6842
Derry NH 03038-1725 USA BSD admin/developer at large
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems