third attempt.

Hi,

we experienced severe performance problems with imap-2002e on AIX 5.1
running on a SP2 Node connected with a High Performance Switch (HPFS).
The problem showed up during peak hours when within minutes the performance
droped to zero (the server no longer responded) and we had up to 700+ imapd
processes running (normaly about 150). netstat showed that most of them
where in CLOSE_WAIT. Recovery was only possible by stopping inetd to prevent
creation of more imap processes and killing those CLOSE_WAIT processes or
rebooting the system only to find us in the same spot 30 min later.
Doing some traces we where able to establish the events leading to this
situation.

1. Client opens session and proceeds to select inbox.
2. select does not return within timeout set by client (1 Min.).
3. Client drops the tcp session and receives ACK.
   The imapd is still hanging in select.
4. Client opens new tcp session and proceeds to select inbox.
   Of course the impad can not obtain the lock because it is still held
   by the previous session. Again the select does not return within
   timeout set by client.
5. Point 3 and 4 are repeated again and again.
6. Eventually the select from point 1 tries to respond and finds the
   tcp session gone and closes the socket. Only now are the messages,
   generated while waiting for the lock, sent along with the FIN.
   The client responds with reset because it had already dropped that session.
7. The number of sessions such generated is only bounded by the resources
   available to the server.

Looking at the source we found that the messages generated while waiting
for the lock are writen in imapd.c:mm_log. But because it is buffered I/O
the messages remain in the buffer which is in the case of the HPFS interface
rather large. So it does not reach the client until some other output fills
the buffer or the socket is closed.

The quick and dirty solution was to add a PFLUSH(); statement after CLRF; in
both the PARSE and WARN case in mm_log() to make shure out of bound messages
reach the client.

Maybe there are other places in imapd where it is necessary to make shure
the message generated gets on the wire immediatly, after all there is a
dialog going on between the client and the server.

After applying the change the "waiting for the lock" messages reached the
client causing it to reset its timeout and no longer drop its tcp session.

In the 4 weeks running with the change applied the system never went into the
state described above. We also did not notice any problems introduced by
the change. But maybe you know a better solution.

Kind regards
Stefan Vogel, Paul Tedaldi
-- 
------------------------------------------------------------------
 For information about this mailing list, and its archives, see: 
 http://www.washington.edu/imap/c-client-list.html
------------------------------------------------------------------

Reply via email to