On systems that can accept new CGI requests faster than the daemon can
fork, it is easy to fill the listen queue on the Unix socket and get
ECONNREFUSED, resulting in a 500 error going back to the browser.  It
is alarming to see a zillion requests for the same resource in the
access log, with large numbers of them getting 500 instead of 200.

While I suspect that it is appropriate to add a directive to control
the listen queue size, that is not sufficient.  As long as we are able
to fill the queue with the current limit we'll always be able to fill
the queue...  It'll just take a little (?) longer.

This patch retries the connect after a timeout, up to a limit on the
number of retries.  I suspect that just as we may need a directive for
the listen queue size we'll also want a directive for the retry limit.

I noticed some socket leakage on error paths.  One fix is entangled in
the patch below.  Probably I will attack the leakage first and then
come back and deal with the ECONNREFUSED issue.

Index: modules/generators/mod_cgid.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/generators/mod_cgid.c,v
retrieving revision 1.110
diff -u -r1.110 mod_cgid.c
--- modules/generators/mod_cgid.c       8 Jan 2002 06:26:09 -0000       1.110
+++ modules/generators/mod_cgid.c       28 Jan 2002 11:36:43 -0000
@@ -160,6 +160,16 @@
 #define DEFAULT_CGID_LISTENBACKLOG 100
 #endif
 
+/* DEFAULT_CONNECT_ATTEMPTS controls how many times we'll try to connect
+ * to the cgi daemon from the thread/process handling the cgi request.
+ * Generally we want to retry when we get ECONNREFUSED since it is
+ * probably because the listen queue is full.  We need to try harder so
+ * the client doesn't see it as a 500 error.
+ */
+#ifndef DEFAULT_CONNECT_ATTEMPTS
+#define DEFAULT_CONNECT_ATTEMPTS  15
+#endif
+
 typedef struct { 
     const char *sockname;
     const char *logname; 
@@ -863,6 +873,8 @@
     struct sockaddr_un unix_addr;
     apr_file_t *tempsock;
     apr_size_t nbytes;
+    int connect_tries;
+    apr_interval_time_t sliding_timer;
 
     if(strcmp(r->handler,CGI_MAGIC_TYPE) && strcmp(r->handler,"cgi-script"))
        return DECLINED;
@@ -923,23 +935,46 @@
     ap_add_cgi_vars(r); 
     env = ap_create_environment(r->pool, r->subprocess_env); 
 
-    if ((sd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
-            return log_scripterror(r, conf, HTTP_INTERNAL_SERVER_ERROR, errno, 
-                                   "unable to create socket to cgi daemon");
-    } 
     memset(&unix_addr, 0, sizeof(unix_addr));
     unix_addr.sun_family = AF_UNIX;
     strcpy(unix_addr.sun_path, conf->sockname);
 
-    if (connect(sd, (struct sockaddr *)&unix_addr, sizeof(unix_addr)) < 0) {
+    connect_tries = 0;
+    sliding_timer = 100000; /* 100 milliseconds */
+    while (1) {
+        ++connect_tries;
+        if ((sd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
             return log_scripterror(r, conf, HTTP_INTERNAL_SERVER_ERROR, errno, 
-                                   "unable to connect to cgi daemon");
-    } 
+                                   "unable to create socket to cgi daemon");
+        }
+        if (connect(sd, (struct sockaddr *)&unix_addr, sizeof(unix_addr)) < 0) {
+            if (errno == ECONNREFUSED && connect_tries < DEFAULT_CONNECT_ATTEMPTS) {
+                ap_log_rerror(APLOG_MARK, APLOG_DEBUG, errno, r,
+                              "connect #%d to cgi daemon failed, sleeping before 
+retry",
+                              connect_tries);
+                close(sd);
+                apr_sleep(sliding_timer);
+                if (sliding_timer < 2 * APR_USEC_PER_SEC) {
+                    sliding_timer *= 2;
+                }
+            }
+            else {
+                close(sd);
+                return log_scripterror(r, conf, HTTP_INTERNAL_SERVER_ERROR, errno, 
+                                       "unable to connect to cgi daemon after 
+multiple tries");
+            }
+        }
+        else {
+            break; /* we got connected! */
+        }
+    }
 
     send_req(sd, r, argv0, env, CGI_REQ); 
 
     /* We are putting the tempsock variable into a file so that we can use
      * a pipe bucket to send the data to the client.
+     * Note: apr_os_file_put does not establish a cleanup, so we better be
+     * careful.
      */
     apr_os_file_put(&tempsock, &sd, 0, r->pool);
 

-- 
Jeff Trawick | [EMAIL PROTECTED] | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Reply via email to