vixie-cron acting weird (actually not acting at all)

shimi Sat, 21 Apr 2007 11:45:51 -0700

Hi All,

Sorry for being so verbose, but I was not really sure which of all those 
details is important to understand the source of the problem :)


I've installed a new machine into production on Thursday (19 Apr 2007).
Machine is still running since then:
$ uptime
 21:10:43 up 2 days, 10:11,  2 users,  load average: 0.02, 0.04, 0.01

The machine is a Dual Core Xeon 3GHz. with HyperThreading enabled (so "grep 
processor /proc/cpuinfo | wc -l" says "4").

Machine is running Gentoo Linux, with kernel 2.6.19-gentoo-r5, x86_64.

Now for my problem. 

I installed the system with Vixie-Cron, and crond appears to be running. It 
appears in the processlist, and it sits in the Ss state on ps. So far - very 
normal.

According to the logs, until April 20th, 19:00, processes ran exactly when 
they were defined to run (I have a process that runs every 15 minutes).
From 19:00 until 23:16:55 (note the 1 minute and 55 seconds after the round 
hour quarter) - there is a complete silence in the logs. 

It then resumed running with the same delta from the quarter hour until 
00:02am on Apr 21. A bit later, I see that ntpd reports that ntp had no 
servers available to sync (at 2:31am). Doesn't seem related, but I am 
mentioning it anyways, as cron is, after all, time based. 

Occasionally from that time I see Gentoo's run-crons acting at some hours, 
like 2:50am, 03:07:17am (where is removes the lastrun of cron.daily and 
04:26:15am where it ran cron.weekly). At 03:14:25am I also see ntpd 
synchronized back against 192.43.244.18, stratum 1.

There  is another run-crons at 05:40am, and weirdly enough, cron kicks back to 
life at Apr 21, 07:45:41am, with my regular 15-minutes task, which it 
executes once. Since then, silence until 09:00am where only CERTAIN tasks are 
executed (and the every-15 minutes DOES NOT), and again silence until 
17:16:13 where the every-15 kicks in, then it works at 17:33 and 17:46, and 
since then, silence again.

I tried restarting cron at 20:51:27, the restart got logged. Didn't seem to 
have any effect. on 21:20:31 I see a run-crons, yet the every-15 does not 
work.

I tried stracing the crond process, and I got the following:
Process 6796 attached - interrupt to quit
stat("crontabs", {st_mode=S_IFDIR|0750, st_size=120, ...}) = 0
stat("/etc/cron.d", {st_mode=S_IFDIR|0755, st_size=96, ...}) = 0
stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=1365, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x40272b, [], SA_RESTORER|SA_RESTART, 
0x2b28e1e255c0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({9, 0},

Which I gather should have existed from sleeping pretty quickly, but did not. 
Only after some time I got this (first line is continuing of last snippet) :

{9, 0})               = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x2b28e2125e10) = 6941
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x40272b, [], SA_RESTORER|SA_RESTART, 
0x2b28e1e255c0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({10, 0}, 0x7fffc8ed59c0)      = ? ERESTART_RESTARTBLOCK (To be 
restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11)                      = -1 EINTR (Interrupted system call)
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 6941
wait4(-1, 0x7fffc8ed59dc, WNOHANG, NULL) = -1 ECHILD (No child processes)
stat("crontabs", {st_mode=S_IFDIR|0750, st_size=120, ...}) = 0
stat("/etc/cron.d", {st_mode=S_IFDIR|0755, st_size=96, ...}) = 0
stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=1365, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x40272b, [], SA_RESTORER|SA_RESTART, 
0x2b28e1e255c0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({30, 0},           

I did the same strace on my cron at my computer at home, which appeared to be 
sleeping for a different period (namely, 60), but this doesn't look so 
important, as the sleep on my computer at home ends rather quickly and I see 
many scans of crontabs, /etc/cron.d and /etc/crontab, which is normal 
behavior, I guess.

So I am thinking there is something maybe wrong with nanosleep(). But what can 
it be? My guess is related to time drifting due to all those CPUs, but in 
that case, why did it work great in the begining?

I didn't try rebooting the machine, which might have solved the problem 
(either temporarly or not), but wanted to try and solve it (or at least 
understand the problem) before I might get it gone.

So, any hint from you guys will be greatly appreciated.

Thanks,

        -- Shimi

================================================================To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

vixie-cron acting weird (actually not acting at all)

Reply via email to