Johannes Erdfelt writes: > On Thu, Dec 06, 2001, Gordon Messmer <[EMAIL PROTECTED]> wrote: >> On Thu, 6 Dec 2001, Johannes Erdfelt wrote: >> > The mail server is busy much of the time, but I don't think it's busy >> > enough to naturally hit the respawnhi timeout. It looks like somehow >> > courier missed that a child finished and that's why it hit the respawnhi >> > timeout. >> >> I was wrong about that. The child processes are still legitimately >> running. As fate would have it just as I started this email, I was pulled >> in to some mail server issues and noticed that the respawnhi thing had >> happened again. All of the couriersmtp processes were stuck in a read() >> system call on fd 5. I have the control file from a couple, and there are >> lots of DNS failures recorded. >> >> It's much too late to do any debugging right now, but I'll be over this >> tomorrow. In any case, it's not that courierd isn't harvesting children, >> it's that the children are blocking on an unprotected read(). (I thought >> they all had alarms in place... /me shrugs) > > I checked for any running processes, but I couldn't find any. I do have > lots of courier related process running (authdaemon, pop and imap) so I > may have missed one. > > Either way, my system sat for 6 hours or so doing nothing. If you're > right that there was a process still running, something is missing a > timeout. > > I wonder what the longest timeout is. I guess presumably the respawnhi > could happen at a time right after a legitimate process is spawned which > then needs to timeout to a client, there will always be the chance that > courier just stops delivering email for a while. > > respawnhi seems to need some sort of timeout, even if it's extremely > long.
The server is designed to restart itself only when no mail is pending. The problem is that the client should not be stuck like that. There's a select() before every read from the socket, so if anything, it should be stuck in a select(). Get the date of the stuck message, and review your logs to see if there are any errors in syslog around that time, or a little bit later. -- Sam _______________________________________________ courier-users mailing list [EMAIL PROTECTED] Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users
