( Greetings...  Please forgive me if I misstep here.  This is my  
first post to a public project such as this. You can probably just  
skip all the excess here and look at the patch below ;)

The SYMPTOM:

   Getting "spam acl condition: Resource temporarily unavailable on  
spamd socket" during WITH_CONTENT_SCAN (exiscan style) spam scanning  
on Mac OS X causing a DEFER result to the ACL (or PASS if one uses / 
defer_ok).  One also sees "Warning: ACL "warn" statement skipped:  
condition test deferred" due to the condition.  This might date all  
the way back to version 4.52 with the "poll() on OS X" fix.  In the  
spamd log, one notes a correlated "spamd: bad protocol: header error:  
(Content-Length mismatch: Expected <actualMessageSize> bytes, got  
<somethingSmaller> bytes)".  I personally can reproduce the bug  
simply by sending a big message through the spam scanning (>60K with  
my TCP settings).  A DEFER results in endless "temporarily rejected  
after DATA" (451 SMTP result code?) or lets occasional spam through  
if one uses /defer_ok on the DENY ACL.

The CAUSE:
   In spam.c after a connection is established and the header is  
sent, the socket is put into non-blocking mode, then all data is  
rushed to it, yet EAGAIN is treated as a hard error (EAGAIN =  
Resource temporarily unavailable).  In fact, EAGAIN is expected to  
occur -- particularly for large messages since they'll tend to fill  
up the outgoing low level buffers before the data reaches the network  
(or spamd/perl is able to suck it up on the other end?).

The CURE:
   To avoid re-introducing blocking, avoid a CPU burning loop, and  
avoid the spotty implementations of poll() on OS X, one can just use  
the natural BSDish select() instead!  In fact, exim itself uses select 
() rather than poll() -- albeit for reading rather than sending.  Of  
course, select() is not identical to poll() but for this simple use  
it happens to be close enough for an easy swap.  Even the result code  
and expected errors are close enough.  I'm using this successfully in  
production on 10.4 as we speak. I would expect select() to be solid  
on all past and future OS X versions, other BSD's, and most other OS  
too.  I think one is supposed to #include <sys/select.h> to be  
posixly correct, but exim only bothers with that for QNX.  In any  
case, #include exim.h covers it fine.

One might also rename NO_POLL_H to be USE_SELECT_NOT_POLL or  
something similar.  Or, if it is possible, one should drop the use of  
poll().  Exim solely uses select(), albeit only for reading.  Is it  
somehow consistent across OS's for polling readability but not for  
writability?  I do see that 'nmap' uses select() for read, write and  
extra monitoring, and that seems to work for people across enough  
platforms.

On a separate note in checking the result of the poll() or select(),  
and dealing with potential "signal" interruption, it probably should  
"goto again;" rather than "continue;"  I think "continue" will lose  
the current unsent buffer's worth of data and corrupt the message  
that makes it out to spamd!  Coincidentally, this would also  
precipitate the same symptoms described above since less data would  
be sent than was announced in the header.  A rare case, but also  
included in the patch below.

REFERENCES:
   http://www.erlenstar.demon.co.uk/unix/faq_3.html#SEC29
   http://www.hmug.org/man/2/select.php
   http://seth.positivism.org/man.cgi/select
   http://developer.apple.com/technotes/tn2002/tn2071.html


PATCH against 4.62, (I used 'diff -cd', patch -p 0 from exim  
directory should work):

*** src/spam.old        Wed Jun  7 15:20:42 2006
--- src/spam.c  Wed Jun  7 15:45:04 2006
***************
*** 42,47 ****
--- 42,50 ----
     struct sockaddr_un server;
   #ifndef NO_POLL_H
     struct pollfd pollfd;
+ #else
+   struct timeval select_tv;
+   fd_set select_fd;
   #endif

     /* stop compiler warning */
***************
*** 218,224 ****
      * and we poll the desciptor to make sure that we can write without
      * blocking.  Short writes are gracefully handled and if the whole
      * trasaction takes too long it is aborted.
!    * Note: poll() is not supported in OSX 10.2.
      */
   #ifndef NO_POLL_H
     pollfd.fd = spamd_sock;
--- 221,228 ----
      * and we poll the desciptor to make sure that we can write without
      * blocking.  Short writes are gracefully handled and if the whole
      * trasaction takes too long it is aborted.
!    * Note: poll() is not supported in OSX 10.2 and is reported to be
!    *       broken in more recent versions (up to 10.4).
      */
   #ifndef NO_POLL_H
     pollfd.fd = spamd_sock;
***************
*** 232,239 ****
   again:
   #ifndef NO_POLL_H
         result = poll(&pollfd, 1, 1000);
         if (result == -1 && errno == EINTR)
!         continue;
         else if (result < 1) {
           if (result == -1)
             log_write(0, LOG_MAIN|LOG_PANIC,
--- 236,250 ----
   again:
   #ifndef NO_POLL_H
         result = poll(&pollfd, 1, 1000);
+ #else
+       select_tv.tv_sec = 1;
+       select_tv.tv_usec = 0;
+       FD_ZERO(&select_fd);
+       FD_SET(spamd_sock, &select_fd);
+       result = select(spamd_sock+1, NULL, &select_fd, NULL,  
&select_tv);
+ #endif
         if (result == -1 && errno == EINTR)
!         goto again;
         else if (result < 1) {
           if (result == -1)
             log_write(0, LOG_MAIN|LOG_PANIC,
***************
*** 248,254 ****
           (void)fclose(mbox_file);
           return DEFER;
         }
- #endif
         wrote = send(spamd_sock,spamd_buffer + offset,read - offset,0);
         if (wrote == -1)
         {
--- 259,264 ----


-- 
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details 
at http://www.exim.org/ ##

Reply via email to