Hi,

For a while now I've had stability problems with ASSP. It has generally
been one or two restarts a day. When I upgraded to 16270 I had huge
problems with delayed mail, mail not getting through at all and ASSP
continually shutting down each time there were thousands of "unable to
detect any running worker" spewed into the log files. Yesterday I dropped
back to 16256 and at least mail is flowing but so far I've had one ASSP
instance shut itself off 8 times today.

I've two instances doing this, both on Ubuntu 14.04.

Last night I set debugging on and caught the incident on both servers
within half an hour. I've looked through the debug file but there is
nothing I can see to indicate any errors. Both servers were handling mail
from different senders in the few minutes leading up to the fault.

So I looked back through previous threads on the same issue today and saw a
Thomas ask what the worker status page showed when it happens. I was
wondering how on earth I was going to catch it when it happens and before
it reboots then lo and behold whilst I was on the web interface I saw the
errors flying past in a tail of the maillog.

I went on the web interface and the dot at the top had turned red. I then
went on the worker status page and that was all green.

Up until now, I have been running 10 workers which is possibly overkill. I
had just reduced this instance of ASSP to 5 workers as a test. Status of
the workers is:
1,2,3,5 - ThreadGetNewCon with loop age 0s (worker 3 had 1s)
4 - Maillog
10000 - MonitorMainThread (0s)
10001 - schedule waiting (71s)

I went back to the main page and the dot had gone back to green but the
maillog was still filling with the running worker errors.

I refreshed the status and the only changes were:
4 - "wh:0 - write: - wait: 0.005"
The time on schedule waiting went up to 96s.

Shortly after ASSP Shut down. It is like the main thread and the workers
just stop talking to each other.

I'd love to crack this and give the latest development version a go because
right now I have the annoying issue of an SSL session taking so long that
the sending server starts sending it again (This is smtproutes.com not
gmail in this case). I've also seen this from Mandrill and got in touch
with their support. They explained that it was down to shared spools. One
server behind the infrastructure picks up the message and starts delivering
it. 10 minutes later it is still there so another server picks it up and
starts delivering it. Whichever completes first removes the file and the
other servers terminate.

All the best,
Colin.
------------------------------------------------------------------------------
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to