Michael, Ralf, > >> but, if clamav hung on the primary (like it has done twice since > >> upgrading to 0.97.1), > > > > Ah, it's happening to you as well? Happened here twice or three times > > already :( > > > >> amavisd just seems to sit there till I totally kill clamd with a > >> sigsegv. > > > > Yeah, same here. > > I still want to put a timeout in amavisd so that my secondary takes over. > anyone help?
Well, timeouts on virus scanners are implemented at two levels: by setting an alarm and catching its signal, and by a timeout on a socket. The value of a timeout for each operation is calculated dynamically according to a remaining time left until a deadline (improved somewhat in 2.7.0), so the only user-configurable setting is $child_timeout, with a sensible value perhaps a bit under a minute, like 45 seconds. That applies to a proxy setup with 2.7.0(-rc/pre). With a post-queue setup one can afford a longer time limit. But apperently (at least in Ralf's case with 2.7.0) these timeout mechanisms did not do their job. Seems like one of the operations (connect/write/read/close) got stuck in an uninterruptible state. This hasn't happened here yet (despite running 0.97.1), but our mail traffic is much lighter than yours - so I don't see how to test this. Is there a way to make clamd stuck at will? It would be useful to see amavisd log (at log level 3 at least) when this happens, or perhaps later with a debug run with clamd still being stuck. It may be possible to have two instances of clamd running on separate sockets, and when one fails switch over and restart amavisd on the other, while leaving the first for experimentation. Mark
