Jun 27 16:21:26 raw courierd: SHUTDOWN: respawnhi limit reached. Jun 27 16:21:26 raw courierd: Waiting. shutdown time=none, wakeup time=none, queuedelivering=2, inprogress=2 Jun 27 16:21:26 raw courierlocal: id=002DA623.3EFCB551.00003133,from=<[EMAIL PROTECTED]>,addr=<[EMAIL PROTECTED]>,size=21612,success: Message delivered. Jun 27 16:21:26 raw courierd: completed,id=002DA623.3EFCB551.00003133 Jun 27 16:21:26 raw courierd: Waiting. shutdown time=none, wakeup time=none, queuedelivering=1, inprogress=1 Jun 27 16:21:32 raw pop3d: Connection, ip=[66.137.223.70] Jun 27 16:21:32 raw pop3d: LOGIN, user=pag-emehler, ip=[66.137.223.70] Jun 27 16:21:32 raw courieresmtpd: started,ip=[209.74.143.13] Jun 27 16:21:32 raw pop3d: LOGOUT, user=pag-emehler, ip=[66.137.223.70], top=0, retr=1921 Jun 27 16:21:33 raw courierd: Waiting. shutdown time=none, wakeup time=none, queuedelivering=1, inprogress=1
Any ideas? Thanks,
A delivery attempt got stalled for over a week. Not just a message, but a single attempt to deliver a message.
The default configuration settings require a server restart at least once a week (which is the upper limit; a server restart may happen as often as once an hour, if mail traffic is light). A server restart naturally requires that all outstanding delivery attempts must be completed (whether the message was succesfully delivered, rejected, or deferred, is irrelevant, the delivery attempt must only complete whatever its results was). Since a message, apparently, got "stuck" the server would essentially wait forever, for the stalled delivery attempts; and it would not begin processing any other mail, in the meantime. You had to do a full server stop, which forcefully aborts all delivery attempts, and restart.
Although techniques such as a watchdog timer could be used, they would only mask the real problem, if a real problem exists; and I'd rather identify any real problems, and fix them.
You need to monitor the system for any delivery attempt that gets stuck again. You have a full week before shit hits the fan, to catch a stalled delivery attempt. It's quite easy to monitor for messages that remain in the queue for more than a couple of days, then search any courieresmtp (not courieresmtpd) process that corresponds to the message, and which has supposedly been running since the message was first introduced into the system.
Once you've identified the stalled courieresmtp process, you'll need to perform some diagnostics to identify what the process is doing. On Linux, tools such as strace can identify what the process is doing; you should have a similar tool yourself. You should also use make install, instead of make install-strip, to install binaries with debug data; then you can attach a debugger to the stalled courieresmtp process, and obtain a stack backtrace to identify exactly where the process is runnibg.
Once you've gathered sufficient information, it should be possible to identify what the bug, if any, is.
pgp00000.pgp
Description: PGP signature
