On 02/01/07 21:51, Rolf E. Sonneveld wrote:
Ian Abbott wrote:
On 02/01/07 12:12, Rolf E. Sonneveld wrote:
According to the monitoring system, the freshclam process disappeared between 14:29 and 14:34. Running ClamAV on Solaris 9. Any idea why after a 'connection refused' or 'connection timed out' the freshclam process dies?

It would be nice if there was an option to run freshclam as a "foreground daemon" so you could monitor its exit status, but there isn't. My guess is that it's receiving a signal whose current action is set to kill the process.

The signal handling for SIGALRM and SIGUSR1 in freshclam.c's main() function is a bit buggy. It sets the following actions in the main loop:

        sigaction(SIGALRM, &sigact, &oldact);
        sigaction(SIGUSR1, &sigact, &oldact);

then later on:

        sigaction(SIGALRM, &oldact, NULL);
        sigaction(SIGUSR1, &oldact, NULL);

There are two problems here. The two signals shouldn't really be using the same variable 'oldact', even though the default action for both signals is the same. The other problem is that the program spends some of its time with the SIGALRM and SIGUSR1 signals set to the default action, which is to terminate the process. In fact, the more I look at the main loop of the freshclam daemon, the worse it gets! It may catch SIGHUP and set the 'terminate' variable at the wrong time, causing the main loop to exit prematurely, or it may fail to catch 'SIGALRM' or 'SIGUSR1' some of the time, causing the process to terminate with that signal.

Thanks, Ian. This sounds interesting. If I understand you correctly, this can be related to the problem we see, with the disappearing freshclam daemon process? I'm not a programmer so I'm afraid I can't contribute code here; also, I'm not familiar with the way ClamAV changes/fixes are done. Is anyone in charge of the freshclam code?

It might be the problem, especially if you are sending a signal (SIGHUP) to the freshclam process from a log rotation script. If this occurs almost immediately after an internally generated SIGALRM, it could cause the main loop to terminate early, though that is extremely unlikely as the time window is very small. A far more likely cause is that the process is woken up by the SIGHUP and then the internally generated SIGALRM occurs later, killing the process. The program uses the default SIGALRM handler while it is doing all the network stuff, for example, so if the process is woken by an external SIGHUP, spends a lot of time doing network stuff, and receives the internally generated SIGALRM at this time, the process will be killed.

I'll mention my theory on the devel list, anyway.

--
-=( Ian Abbott @ MEV Ltd.    E-mail: <[EMAIL PROTECTED]>        )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=-
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html

Reply via email to