On May 24, 2005, at 19:41, Stephen Gran wrote:
On Tue, May 24, 2005 at 07:10:25PM -0700, Doug Hardie said:
On May 24, 2005, at 13:21, Stephen Gran wrote:
On Tue, May 24, 2005 at 12:54:47PM -0700, Doug Hardie said:
ktrace is effectively the same thing as truss so I used it. There
are two files available:
http://www.lafn.org/clamav/ktrace.html
http://www.lafn.org/clamav/clamd.html
ktrace.html is the output of ktrace - its about 14 MB clamd.html is
the clamd.log file entries - very small and probably of no value
It is difficult to say from the provided ktrace file what is
happening, as there are no timestamps and all lines have the same
pid. One thing that seems odd is that the milter appears to
continue
accepting and processing input after a reload event has happened.
Not for the body, ut for all other milter events (header, connect,
etc). That is a start at least.
Is there a way to log seperately by pid or something with ktrace? I
don't know it well, so I am not sure what arguments to tell you to
pass it. Also, I am not sure that will even work - in a proper
thread implementation, all threads share a pid (but have different
lwp id's) so this may not be possible.
clamav-milter is only one process. It has multiple threads but those
are not visible to the kernel.
I don't know how the bsd implementation of threads work, as I
said. On
linux, the separate threads share a pid but have different lwp
id's, and
are separable to the kernel and to strace. It will make things a
little
harder if the same is not true on bsd.
The problem does not occur immediately with a database reload. It
takes 10 or so minutes before it hangs/quits. I suspect that the
problem occurs when there are active messages that do not complete
before some timeout value. clamav-milter is waiting for everything
to go quiet, but on my receive mail server that never happens.
There
are always 30-40 active sendmail children. As a result it never
goes
quiet. I suspect that clamav-milter eventually gives up and thats
when the problem occurs. On my outgoing mail server which handles
considerably less mail, most of the database updates do not cause a
problem. On my test server which handles 3 email daily it never
causes a problem.
This is the generally observed pattern, so it's good to know we're
chasing the same problem, at least.
kdump will provide the timestamps if that would be helpful, but the
entries are pretty much evenly spaced out over about a 5 minute
period
between when I touched the daily file and when it hung.
Well, that's helpful - looking at the file at first, I had no way of
telling that.
What I can glean from the output you have provided is that there is a
point reached where some threads begin doing a write(not accepting
inputs), which I would expect from the source. But puzzlingly, some
(other? No way to know without being able to separate the threads)
are
still accepting and processing messages after that point.
I also see no mutex related calls, which I would have expected to
see a
lot of. Since I suspect the problem is that one htread is prematurely
altering or locking a mutex, stalling the others, this makes it harder
to debug the sequence of events :) This is presumably a problem of
ktrace or the invocation, rather than an absence of events,
though. It
appears from what I can find of their respective man pages, that truss
may better at this sort of thing than ktrace (it certainly seems to
do a
better job following forks and threads in the solaris page I see). Do
you mind giving it a go?
truss basically generates no output and kills the process.
strace does not generate any output that identifies threads and
shortly after starting generates buss errors.
I don't believe clamav-milter is actually stopping new messages.
After doing the touch on the database file, I continue to see maillog
messages that the Milter messages have been added. This continues
right up until it hangs/crashs. I don't see anything tempfailing the
new messages. I suspect sendmail continues to send them and they
continue to be processed.
_______________________________________________
http://lurker.clamav.net/list/clamav-users.html