I've been having what "seemed" to be random crashes that left nothing in the 
logs, until I noticed that they always happen just after 2:02 (while my daily 
cron jobs are running) - so they're not random after all. Here are the last 3 
crashes - from 10/4, 6/5 and 9/5. You can see that there are no log entries 
after 2:02, until I do a hard re-boot:

----- 1 ------
Apr 10 01:58:01 shlomo1 crond[9786]: (root) CMD (/data1/myscripts/myADSLtest)
Apr 10 02:00:01 shlomo1 crond[9811]: (root) CMD (/data1/myscripts/myADSLtest)
Apr 10 02:00:01 shlomo1 crond[9812]: (root) CMD (/data1/myscripts/myAlive)
Apr 10 02:01:01 shlomo1 crond[9830]: (root) CMD (nice -n 19 
run-parts /etc/cron.hourly)
Apr 10 02:02:01 shlomo1 crond[9845]: (root) CMD (/data1/myscripts/myADSLtest)
Apr 10 02:02:01 shlomo1 crond[9846]: (root) CMD (nice -n 19 time 
run-parts /etc/cron.daily)
Apr 10 02:02:02 shlomo1 anacron[9856]: Updated timestamp for job `cron.daily' 
to 2008-04-10
Apr 10 02:02:02 shlomo1 /etc/cron.daily/awffull[9859]: the /tmp/awffull.lock 
file was found indicating an error. Maybe awffull is still running...
Apr 10 02:02:03 shlomo1 logrotate: ALERT exited abnormally with [1]
Apr 10 05:38:51 shlomo1 syslogd 1.4.2: restart.
Apr 10 05:38:51 shlomo1 kernel: klogd 1.4.2, log source = /proc/kmsg started.
Apr 10 05:38:51 shlomo1 kernel: Linux version 2.6.22.12-desktop586-1mdv 
([EMAIL PROTECTED]) (gcc version 4.2.2 20070909 (prerelease) 
(4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST 2007


----- 2 ------
May  6 01:58:01 shlomo1 crond[21897]: (root) CMD (/data1/myscripts/myADSLtest)
May  6 02:00:01 shlomo1 crond[21916]: (root) CMD (/data1/myscripts/myAlive)
May  6 02:00:01 shlomo1 crond[21917]: (root) CMD (/data1/myscripts/myADSLtest)
May  6 02:01:01 shlomo1 crond[21937]: (root) CMD (nice -n 19 
run-parts /etc/cron.hourly)
May  6 02:02:01 shlomo1 crond[21951]: (root) CMD (/data1/myscripts/myADSLtest)
May  6 02:02:01 shlomo1 crond[21952]: (root) CMD (nice -n 19 time 
run-parts /etc/cron.daily)
May  6 02:02:02 shlomo1 anacron[21962]: Updated timestamp for job `cron.daily' 
to 2008-05-06
May  6 02:02:02 shlomo1 /etc/cron.daily/awffull[21965]: the /tmp/awffull.lock 
file was found indicating an error. Maybe awffull is still running...
May  6 02:02:03 shlomo1 logrotate: ALERT exited abnormally with [1]
May  6 04:47:50 shlomo1 syslogd 1.4.2: restart.
May  6 04:47:50 shlomo1 kernel: klogd 1.4.2, log source = /proc/kmsg started.
May  6 04:47:50 shlomo1 kernel: Linux version 2.6.22.12-desktop586-1mdv 
([EMAIL PROTECTED]) (gcc version 4.2.2 20070909 (prerelease) 
(4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST 2007


----- 3 ------
May  9 01:58:01 shlomo1 crond[27692]: (root) CMD (/data1/myscripts/myADSLtest)
May  9 02:00:01 shlomo1 crond[27708]: (root) CMD (/data1/myscripts/myAlive)
May  9 02:00:01 shlomo1 crond[27709]: (root) CMD (/data1/myscripts/myADSLtest)
May  9 02:01:01 shlomo1 crond[27726]: (root) CMD (nice -n 19 
run-parts /etc/cron.hourly)
May  9 02:02:01 shlomo1 crond[27741]: (root) CMD (/data1/myscripts/myADSLtest)
May  9 02:02:01 shlomo1 crond[27742]: (root) CMD (nice -n 19 time 
run-parts /etc/cron.daily)
May  9 02:02:01 shlomo1 anacron[27752]: Updated timestamp for job `cron.daily' 
to 2008-05-09
May  9 02:02:01 shlomo1 /etc/cron.daily/awffull[27755]: the /tmp/awffull.lock 
file was found indicating an error. Maybe awffull is still running...
May  9 02:02:02 shlomo1 logrotate: ALERT exited abnormally with [1]
May  9 05:36:05 shlomo1 syslogd 1.4.2: restart.
May  9 05:36:05 shlomo1 kernel: klogd 1.4.2, log source = /proc/kmsg started.
May  9 05:36:05 shlomo1 kernel: Linux version 2.6.22.12-desktop586-1mdv 
([EMAIL PROTECTED]) (gcc version 4.2.2 20070909 (prerelease) 
(4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST 2007



The common factor "seems" to be a problem with logrotate, but that's not the 
cause. Here's an example of logrotate aborting and NOT causing a crash. In 
fact, it seems logrotate gives that error every day. The "strange" thing is 
that all the logs seem to be properly rotated, despite the error message. 



May  7 01:58:01 shlomo1 crond[2870]: (root) CMD (/data1/myscripts/myADSLtest)
May  7 02:00:01 shlomo1 crond[2888]: (root) CMD (/data1/myscripts/myAlive)
May  7 02:00:01 shlomo1 crond[2889]: (root) CMD (/data1/myscripts/myADSLtest)
May  7 02:01:01 shlomo1 crond[2906]: (root) CMD (nice -n 19 
run-parts /etc/cron.hourly)
May  7 02:02:01 shlomo1 crond[2920]: (root) CMD (/data1/myscripts/myADSLtest)
May  7 02:02:01 shlomo1 crond[2921]: (root) CMD (nice -n 19 time 
run-parts /etc/cron.daily)
May  7 02:02:01 shlomo1 anacron[2931]: Updated timestamp for job `cron.daily' 
to 2008-05-07
May  7 02:02:01 shlomo1 /etc/cron.daily/awffull[2934]: the /tmp/awffull.lock 
file was found indicating an error. Maybe awffull is still running...
May  7 02:02:02 shlomo1 logrotate: ALERT exited abnormally with [1]
May  7 02:04:01 shlomo1 crond[3112]: (root) CMD (/data1/myscripts/myADSLtest)
May  7 02:06:01 shlomo1 crond[3138]: (root) CMD (/data1/myscripts/myADSLtest)
May  7 02:08:02 shlomo1 crond[3153]: (root) CMD (/data1/myscripts/myADSLtest)
May  7 02:09:02 shlomo1 crond[3164]: (root) CMD ([ -d /var/lib/php ] && 
find /var/lib/php/ -type f -mmin +$(/usr/lib/php/maxlifetime) -print0 | 
xargs -r -0 rm)



So, how do I find out what's causing the crash? My guess is that it's one of 
the daily cron jobs, but how can I find out which? Since the crashes happen 
at irregular intervals (sometimes 3 or 4 weeks apart and sometimes 2 days 
apart), it's not a simple matter of disabling some of the jobs to see if that 
solves the problem. That approach could take months.

BTW, here's a list f the daily cron jobs. My guess is that the problem is a 
job running after logrotate, so that leaves 8 possibilities.


[EMAIL PROTECTED] cron.daily]$ ls -l
total 56
-rwxr-xr-x 1 root root  276 2007-08-17 02:56 0anacron*
-rwxr-xr-x 1 root root 2575 2007-09-01 13:56 awffull*
-rwxr-xr-x 1 root root  396 2007-11-16 23:00 getskyepg*
-rwxr-xr-x 1 root root  400 2007-08-28 21:44 hylafax*
-rwxr-xr-x 1 root root   37 2007-01-28 19:59 logcheck*
-rwxr-xr-x 1 root root  180 2007-07-19 23:57 logrotate*
-rwxr-xr-x 1 root root  410 2007-08-31 01:48 makewhatis.cron*
-rwxr-xr-x 1 root root  137 2007-09-24 17:26 mlocate.cron*
lrwxrwxrwx 1 root root   27 2008-01-02 05:56 
msec -> /usr/share/msec/security.sh*
-rwxr-xr-x 1 root root  431 2006-02-05 22:56 my-aa-findlargefiles*
lrwxrwxrwx 1 root root   26 2008-01-02 20:16 
myRPMlist -> /data1/myscripts/myRPMlist*
-rwxr-xr-x 1 root root  167 2005-01-10 12:51 reoback*
-rwxr-xr-x 1 root root  118 2007-10-02 12:09 rpm*
-rwxr-xr-x 1 root root  101 2007-11-20 19:55 tetex.cron*
-rwxr-xr-x 1 root root  371 2007-08-08 18:35 tmpwatch*
-rwxr-xr-x 1 root root  315 2007-09-05 13:24 tripwire-check*


Can anyone can suggest how to debug this problem? I did think of one idea and 
I'd like comments or suggestions. I could add several cron jobs to run after 
each of the "real" jobs (or add a line to each existing job) to send myself 
an e-mail to know what jobs have run, in order to see when the e-mails stop 
coming. However, I'm not sure if there are overlaps in the running of cron 
jobs - for example, if it possible that job number 2 starts before job number 
1 has ended? If so, hte my idea probably wouldn't work. 

-- 
Shlomo Solomon
http://the-solomons.net
Sent by KMail (KDE 3.5.7) on LINUX Mandriva 2008.0


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to