For a while we've been bothered by a problem on our SLES7 systems where daemons don't 
start at boot time.  We thought it was related to the products themselves, but 
recently I noticed that it almost
always seemed to be the LAST daemon in the list that didn't start.  I've looked into 
it, and I think I've identified a race condition in /etc/init.d/rc that accounts for 
this.

When the script runs, it logs it's messages to /var/log/boot.log via a daemon called 
blogd.  After the last script runs, rc sends QUIT to blogd to shut it down. On a very 
fast machine (such as a
z900), it's possible that the last daemon hasn't completed logging messages when this 
happens.  The QUIT signal seems to propagate back to the starting daemon as a SIGHUP, 
causing it to fail.

We had noticed that the problem seemed to occur most frequently on our production 
systems, and not often on test.  This is consistent since our test system is somewhat 
slower, so would be less likely
to experience the problem.

The attached patch inserts a 2 second sleep before sending the kill signal.  I've 
tested it multiple times, and it has consistently prevented the last daemon from 
failing.

-- rc~ Thu Mar 20 09:04:24 2003
+++ rc  Thu Mar 20 09:04:00 2003
@@ -187,8 +187,9 @@
 fi

 #
-# Stop blogd if running
+# Stop blogd if running (wait 3 seconds for last guy to finish logging)
 #
+sleep 2
 killproc -QUIT /sbin/blogd

 #

Reply via email to