On 02/01/07 21:51, Rolf E. Sonneveld wrote:
Ian Abbott wrote:
On 02/01/07 12:12, Rolf E. Sonneveld wrote:
According to the monitoring system, the freshclam process disappeared
between 14:29 and 14:34. Running ClamAV on Solaris 9. Any idea why
after a 'connection refused' or 'connection timed out' the freshclam
process dies?
It would be nice if there was an option to run freshclam as a
"foreground daemon" so you could monitor its exit status, but there
isn't. My guess is that it's receiving a signal whose current action
is set to kill the process.
The signal handling for SIGALRM and SIGUSR1 in freshclam.c's main()
function is a bit buggy. It sets the following actions in the main loop:
sigaction(SIGALRM, &sigact, &oldact);
sigaction(SIGUSR1, &sigact, &oldact);
then later on:
sigaction(SIGALRM, &oldact, NULL);
sigaction(SIGUSR1, &oldact, NULL);
There are two problems here. The two signals shouldn't really be
using the same variable 'oldact', even though the default action for
both signals is the same. The other problem is that the program
spends some of its time with the SIGALRM and SIGUSR1 signals set to
the default action, which is to terminate the process. In fact, the
more I look at the main loop of the freshclam daemon, the worse it
gets! It may catch SIGHUP and set the 'terminate' variable at the
wrong time, causing the main loop to exit prematurely, or it may fail
to catch 'SIGALRM' or 'SIGUSR1' some of the time, causing the process
to terminate with that signal.
Thanks, Ian. This sounds interesting. If I understand you correctly,
this can be related to the problem we see, with the disappearing
freshclam daemon process? I'm not a programmer so I'm afraid I can't
contribute code here; also, I'm not familiar with the way ClamAV
changes/fixes are done. Is anyone in charge of the freshclam code?
It might be the problem, especially if you are sending a signal (SIGHUP)
to the freshclam process from a log rotation script. If this occurs
almost immediately after an internally generated SIGALRM, it could cause
the main loop to terminate early, though that is extremely unlikely as
the time window is very small. A far more likely cause is that the
process is woken up by the SIGHUP and then the internally generated
SIGALRM occurs later, killing the process. The program uses the default
SIGALRM handler while it is doing all the network stuff, for example, so
if the process is woken by an external SIGHUP, spends a lot of time
doing network stuff, and receives the internally generated SIGALRM at
this time, the process will be killed.
I'll mention my theory on the devel list, anyway.
--
-=( Ian Abbott @ MEV Ltd. E-mail: <[EMAIL PROTECTED]> )=-
-=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html