third attempt. Hi,
we experienced severe performance problems with imap-2002e on AIX 5.1 running on a SP2 Node connected with a High Performance Switch (HPFS). The problem showed up during peak hours when within minutes the performance droped to zero (the server no longer responded) and we had up to 700+ imapd processes running (normaly about 150). netstat showed that most of them where in CLOSE_WAIT. Recovery was only possible by stopping inetd to prevent creation of more imap processes and killing those CLOSE_WAIT processes or rebooting the system only to find us in the same spot 30 min later. Doing some traces we where able to establish the events leading to this situation. 1. Client opens session and proceeds to select inbox. 2. select does not return within timeout set by client (1 Min.). 3. Client drops the tcp session and receives ACK. The imapd is still hanging in select. 4. Client opens new tcp session and proceeds to select inbox. Of course the impad can not obtain the lock because it is still held by the previous session. Again the select does not return within timeout set by client. 5. Point 3 and 4 are repeated again and again. 6. Eventually the select from point 1 tries to respond and finds the tcp session gone and closes the socket. Only now are the messages, generated while waiting for the lock, sent along with the FIN. The client responds with reset because it had already dropped that session. 7. The number of sessions such generated is only bounded by the resources available to the server. Looking at the source we found that the messages generated while waiting for the lock are writen in imapd.c:mm_log. But because it is buffered I/O the messages remain in the buffer which is in the case of the HPFS interface rather large. So it does not reach the client until some other output fills the buffer or the socket is closed. The quick and dirty solution was to add a PFLUSH(); statement after CLRF; in both the PARSE and WARN case in mm_log() to make shure out of bound messages reach the client. Maybe there are other places in imapd where it is necessary to make shure the message generated gets on the wire immediatly, after all there is a dialog going on between the client and the server. After applying the change the "waiting for the lock" messages reached the client causing it to reset its timeout and no longer drop its tcp session. In the 4 weeks running with the change applied the system never went into the state described above. We also did not notice any problems introduced by the change. But maybe you know a better solution. Kind regards Stefan Vogel, Paul Tedaldi -- ------------------------------------------------------------------ For information about this mailing list, and its archives, see: http://www.washington.edu/imap/c-client-list.html ------------------------------------------------------------------
