On May 24, 2005, at 19:41, Stephen Gran wrote:

On Tue, May 24, 2005 at 07:10:25PM -0700, Doug Hardie said:


On May 24, 2005, at 13:21, Stephen Gran wrote:


On Tue, May 24, 2005 at 12:54:47PM -0700, Doug Hardie said:


ktrace is effectively the same thing as truss so I used it.  There
are two files available:

http://www.lafn.org/clamav/ktrace.html
http://www.lafn.org/clamav/clamd.html

ktrace.html is the output of ktrace - its about 14 MB clamd.html is
the clamd.log file entries - very small and probably of no value



It is difficult to say from the provided ktrace file what is
happening, as there are no timestamps and all lines have the same
pid. One thing that seems odd is that the milter appears to continue
accepting and processing input after a reload event has happened.
Not for the body, ut for all other milter events (header, connect,
etc).  That is a  start at least.

Is there a way to log seperately by pid or something with ktrace?  I
don't know it well, so I am not sure what arguments to tell you to
pass it.  Also, I am not sure that will even work - in a proper
thread implementation, all threads share a pid (but have different
lwp  id's) so this may not be possible.


clamav-milter is only one process.  It has multiple threads but those
are not visible to the kernel.


I don't know how the bsd implementation of threads work, as I said. On linux, the separate threads share a pid but have different lwp id's, and are separable to the kernel and to strace. It will make things a little
harder if the same is not true on bsd.


The problem does not occur  immediately with a database reload.  It
takes 10 or so minutes before  it hangs/quits.  I suspect that the
problem occurs when there are  active messages that do not complete
before some timeout value.   clamav-milter is waiting for everything
to go quiet, but on my receive mail server that never happens. There are always 30-40 active sendmail children. As a result it never goes
quiet.  I  suspect that clamav-milter eventually gives up and thats
when the  problem occurs.  On my outgoing mail server which handles
considerably less mail, most of the database updates do not cause a
problem.  On my test server which handles 3 email daily it never
causes a problem.


This is the generally observed pattern, so it's good to know we're
chasing the same problem, at least.


kdump will provide the timestamps if that would be helpful, but the
entries are pretty much evenly spaced out over about a 5 minute period
between when I touched the daily file and when it hung.


Well, that's helpful - looking at the file at first, I had no way of
telling that.

What I can glean from the output you have provided is that there is a
point reached where some threads begin doing a write(not accepting
inputs), which I would expect from the source.  But puzzlingly, some
(other? No way to know without being able to separate the threads) are
still accepting and processing messages after that point.

I also see no mutex related calls, which I would have expected to see a
lot of.  Since I suspect the problem is that one htread is prematurely
altering or locking a mutex, stalling the others, this makes it harder
to debug the sequence of events :)  This is presumably a problem of
ktrace or the invocation, rather than an absence of events, though. It
appears from what I can find of their respective man pages, that truss
may better at this sort of thing than ktrace (it certainly seems to do a
better job following forks and threads in the solaris page I see).  Do
you mind giving it a go?


truss basically generates no output and kills the process.
strace does not generate any output that identifies threads and shortly after starting generates buss errors.

I don't believe clamav-milter is actually stopping new messages. After doing the touch on the database file, I continue to see maillog messages that the Milter messages have been added. This continues right up until it hangs/crashs. I don't see anything tempfailing the new messages. I suspect sendmail continues to send them and they continue to be processed.

_______________________________________________
http://lurker.clamav.net/list/clamav-users.html

Reply via email to