--On Thursday, May 02, 2002 9:52 AM +1000 Jeremy Howard 
<[EMAIL PROTECTED]> wrote:

> I've seen a couple of problems over the last few weeks with master
> apparently failing to correctly maintain the prefork pool. We
> particularly see this problem with pop3d, which has more
> connects/disconnects than IMAP because of the nature of the protocol.
>
> The first issue is that in shut_down() sockets are not closed. It seems
> that this can leave sockets in CLOSE_WAIT state in certain error
> situations where popd_reset() is not called.
>
> The second issue is that we sometimes see sockets remain in a CLOSE_WAIT
> state because there is still data to be read. It appears that prot_fill()
> should be called in popd_reset() and shut_down().
>
> The third issue is that when a process fails to shutdown correctly, such
> as if it segfaults, master does not seem to correctly keep track of the
> child process count. As a result, eventually the pool runs out and no
> more connections are accepted.
>
> Do the resolutions to the first two issues sound correct (we have made
> these changes and it seems to have fixed things for us)? Does anyone have
> a fix for the third issue?

YES!  I believe you have hit the nail on the head on all 3 of the issues
above!  Good job!

We have been particularly bitten by the third issue with the master process
losing track of the number of child processes in each service maintained.
There has to be a better way for the master process to manage its children.

I was thinking that it would be nice if the cyrus server used shared memory
to keep track of the children, which ones were active or idle, which ones
haven't checked in with the master in awhile (possible problem), etc.  If
the master had a incoming client "queue", the children could pick up the
next client and run with it.  Furthermore, all the client information, such
as number of clients each one has handled, etc would be available to the
master, which is the only cyrus process that currently has SNMP support.
This would make SNMP stats far more useful.

Anyways, I wish I had the time right now to dive into the master/child
communicatin problem, but I am glad somebody else has seen the problem too!

Scott
--
 +-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-+
      Scott W. Adkins                http://www.cns.ohiou.edu/~sadkins/
   UNIX Systems Engineer                  mailto:[EMAIL PROTECTED]
        ICQ 7626282                 Work (740)593-9478 Fax (740)593-1944
 +-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-+
     PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/

Attachment: msg07532/pgp00000.pgp
Description: PGP signature

Reply via email to