Johannes Erdfelt writes: 

> On Thu, Dec 06, 2001, Gordon Messmer <[EMAIL PROTECTED]> wrote:
>> On Thu, 6 Dec 2001, Johannes Erdfelt wrote:
>> > The mail server is busy much of the time, but I don't think it's busy
>> > enough to naturally hit the respawnhi timeout. It looks like somehow
>> > courier missed that a child finished and that's why it hit the respawnhi
>> > timeout. 
>> 
>> I was wrong about that.  The child processes are still legitimately 
>> running.  As fate would have it just as I started this email, I was pulled 
>> in to some mail server issues and noticed that the respawnhi thing had 
>> happened again.  All of the couriersmtp processes were stuck in a read() 
>> system call on fd 5.  I have the control file from a couple, and there are 
>> lots of DNS failures recorded. 
>> 
>> It's much too late to do any debugging right now, but I'll be over this 
>> tomorrow.  In any case, it's not that courierd isn't harvesting children,
>> it's that the children are blocking on an unprotected read().  (I thought
>> they all had alarms in place...  /me shrugs)
> 
> I checked for any running processes, but I couldn't find any. I do have
> lots of courier related process running (authdaemon, pop and imap) so I
> may have missed one. 
> 
> Either way, my system sat for 6 hours or so doing nothing. If you're
> right that there was a process still running, something is missing a
> timeout. 
> 
> I wonder what the longest timeout is. I guess presumably the respawnhi
> could happen at a time right after a legitimate process is spawned which
> then needs to timeout to a client, there will always be the chance that
> courier just stops delivering email for a while. 
> 
> respawnhi seems to need some sort of timeout, even if it's extremely
> long.

The server is designed to restart itself only when no mail is pending. 

The problem is that the client should not be stuck like that.  There's a 
select() before every read from the socket, so if anything, it should be 
stuck in a select(). 

Get the date of the stuck message, and review your logs to see if there are 
any errors in syslog around that time, or a little bit later. 

-- 
Sam 


_______________________________________________
courier-users mailing list
[EMAIL PROTECTED]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to