For a while we've been bothered by a problem on our SLES7 systems where daemons don't start at boot time. We thought it was related to the products themselves, but recently I noticed that it almost always seemed to be the LAST daemon in the list that didn't start. I've looked into it, and I think I've identified a race condition in /etc/init.d/rc that accounts for this.
When the script runs, it logs it's messages to /var/log/boot.log via a daemon called blogd. After the last script runs, rc sends QUIT to blogd to shut it down. On a very fast machine (such as a z900), it's possible that the last daemon hasn't completed logging messages when this happens. The QUIT signal seems to propagate back to the starting daemon as a SIGHUP, causing it to fail. We had noticed that the problem seemed to occur most frequently on our production systems, and not often on test. This is consistent since our test system is somewhat slower, so would be less likely to experience the problem. The attached patch inserts a 2 second sleep before sending the kill signal. I've tested it multiple times, and it has consistently prevented the last daemon from failing. -- rc~ Thu Mar 20 09:04:24 2003 +++ rc Thu Mar 20 09:04:00 2003 @@ -187,8 +187,9 @@ fi # -# Stop blogd if running +# Stop blogd if running (wait 3 seconds for last guy to finish logging) # +sleep 2 killproc -QUIT /sbin/blogd #
