( Greetings... Please forgive me if I misstep here. This is my first post to a public project such as this. You can probably just skip all the excess here and look at the patch below ;)
The SYMPTOM: Getting "spam acl condition: Resource temporarily unavailable on spamd socket" during WITH_CONTENT_SCAN (exiscan style) spam scanning on Mac OS X causing a DEFER result to the ACL (or PASS if one uses / defer_ok). One also sees "Warning: ACL "warn" statement skipped: condition test deferred" due to the condition. This might date all the way back to version 4.52 with the "poll() on OS X" fix. In the spamd log, one notes a correlated "spamd: bad protocol: header error: (Content-Length mismatch: Expected <actualMessageSize> bytes, got <somethingSmaller> bytes)". I personally can reproduce the bug simply by sending a big message through the spam scanning (>60K with my TCP settings). A DEFER results in endless "temporarily rejected after DATA" (451 SMTP result code?) or lets occasional spam through if one uses /defer_ok on the DENY ACL. The CAUSE: In spam.c after a connection is established and the header is sent, the socket is put into non-blocking mode, then all data is rushed to it, yet EAGAIN is treated as a hard error (EAGAIN = Resource temporarily unavailable). In fact, EAGAIN is expected to occur -- particularly for large messages since they'll tend to fill up the outgoing low level buffers before the data reaches the network (or spamd/perl is able to suck it up on the other end?). The CURE: To avoid re-introducing blocking, avoid a CPU burning loop, and avoid the spotty implementations of poll() on OS X, one can just use the natural BSDish select() instead! In fact, exim itself uses select () rather than poll() -- albeit for reading rather than sending. Of course, select() is not identical to poll() but for this simple use it happens to be close enough for an easy swap. Even the result code and expected errors are close enough. I'm using this successfully in production on 10.4 as we speak. I would expect select() to be solid on all past and future OS X versions, other BSD's, and most other OS too. I think one is supposed to #include <sys/select.h> to be posixly correct, but exim only bothers with that for QNX. In any case, #include exim.h covers it fine. One might also rename NO_POLL_H to be USE_SELECT_NOT_POLL or something similar. Or, if it is possible, one should drop the use of poll(). Exim solely uses select(), albeit only for reading. Is it somehow consistent across OS's for polling readability but not for writability? I do see that 'nmap' uses select() for read, write and extra monitoring, and that seems to work for people across enough platforms. On a separate note in checking the result of the poll() or select(), and dealing with potential "signal" interruption, it probably should "goto again;" rather than "continue;" I think "continue" will lose the current unsent buffer's worth of data and corrupt the message that makes it out to spamd! Coincidentally, this would also precipitate the same symptoms described above since less data would be sent than was announced in the header. A rare case, but also included in the patch below. REFERENCES: http://www.erlenstar.demon.co.uk/unix/faq_3.html#SEC29 http://www.hmug.org/man/2/select.php http://seth.positivism.org/man.cgi/select http://developer.apple.com/technotes/tn2002/tn2071.html PATCH against 4.62, (I used 'diff -cd', patch -p 0 from exim directory should work): *** src/spam.old Wed Jun 7 15:20:42 2006 --- src/spam.c Wed Jun 7 15:45:04 2006 *************** *** 42,47 **** --- 42,50 ---- struct sockaddr_un server; #ifndef NO_POLL_H struct pollfd pollfd; + #else + struct timeval select_tv; + fd_set select_fd; #endif /* stop compiler warning */ *************** *** 218,224 **** * and we poll the desciptor to make sure that we can write without * blocking. Short writes are gracefully handled and if the whole * trasaction takes too long it is aborted. ! * Note: poll() is not supported in OSX 10.2. */ #ifndef NO_POLL_H pollfd.fd = spamd_sock; --- 221,228 ---- * and we poll the desciptor to make sure that we can write without * blocking. Short writes are gracefully handled and if the whole * trasaction takes too long it is aborted. ! * Note: poll() is not supported in OSX 10.2 and is reported to be ! * broken in more recent versions (up to 10.4). */ #ifndef NO_POLL_H pollfd.fd = spamd_sock; *************** *** 232,239 **** again: #ifndef NO_POLL_H result = poll(&pollfd, 1, 1000); if (result == -1 && errno == EINTR) ! continue; else if (result < 1) { if (result == -1) log_write(0, LOG_MAIN|LOG_PANIC, --- 236,250 ---- again: #ifndef NO_POLL_H result = poll(&pollfd, 1, 1000); + #else + select_tv.tv_sec = 1; + select_tv.tv_usec = 0; + FD_ZERO(&select_fd); + FD_SET(spamd_sock, &select_fd); + result = select(spamd_sock+1, NULL, &select_fd, NULL, &select_tv); + #endif if (result == -1 && errno == EINTR) ! goto again; else if (result < 1) { if (result == -1) log_write(0, LOG_MAIN|LOG_PANIC, *************** *** 248,254 **** (void)fclose(mbox_file); return DEFER; } - #endif wrote = send(spamd_sock,spamd_buffer + offset,read - offset,0); if (wrote == -1) { --- 259,264 ---- -- ## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
