>Number: 4787 >Category: general >Synopsis: tcp keepalive and ExitAfterIdleTimeout >Confidential: no >Severity: non-critical >Priority: medium >Responsible: apache >State: open >Class: change-request >Submitter-Id: apache >Arrival-Date: Wed Jul 28 14:10:00 PDT 1999 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.3.6 >Environment: SunOS mauro 5.6 Generic_105181-11 sun4u sparc SUNW,Ultra-5_10 >Description: We are using apache in a filrewall environment. More precise between two firewall (in the DMZ). There we had the problem that the firewall does open (as the wire get cut) idle tcp connections after certain amount of time (about 60 to 90 minutes).
If there are open tcp connection in a child process which are idle for a long time, these processes hang forever, because the fw does not 'reset' and the tcp implementation does not recognise it. This ca happen in two situations (for each of which I propose a patch): 1. because (at least on Solaris) the SO_KEEPALIVE which is set on the listener socket gets NOT inherited by the new accepted socket. That means if you have for example an application which does a 'server push' which can be idle for a long time, you get hanging child. The same can happen if the module does not properly use the *timeout calls. 2. You use some for of 'static' connections to some backend services (e.g DB). In this situation the server will hang if it does not use SO_KEEPALIVE and the child must wait a very long time (about 8 min) on the next call of the backend service, because it takes that long to recognise and shutdown a broken tcp connection. To solve the case 1 I set the SO_KEEPALIVE just after the accept() call. For the seconf case I added a new functionality 'ExitAfterIdleTimeout' which terminates an idle child after a certain amount of time (less than the fw timeout). This solve the case 2 an it provides a mean which lets the number of idle child slowly decrement from MaxIdle to MinIdle Childs (in a period of low load). >How-To-Repeat: Play around with firewalls and have patient >Fix: see the following patches: *** apache_1.3.6/src/main/http_core.c Sat Mar 20 00:54:08 1999 --- apache_1.3.6-2.3.9/src/main/http_core.c Tue Jul 27 22:38:35 1999 *************** *** 2650,2655 **** --- 2650,2668 ---- } #endif + #ifndef NO_ADN_ADDON + extern int ap_exit_after_idle_timeout; /* http_main.c */ + static const char *set_idle_timeout (cmd_parms *cmd, void *dummy, char *arg) + { + const char *err = ap_check_cmd_context(cmd, GLOBAL_ONLY); + if (err != NULL) { + return err; + } + ap_exit_after_idle_timeout = atoi (arg); + return NULL; + } + #endif /* NO_ADN_ADDON */ + /* Note --- ErrorDocument will now work from .htaccess files. * The AllowOverride of Fileinfo allows webmasters to turn it off */ *************** *** 2875,2880 **** --- 2888,2897 ---- (void*)XtOffsetOf(core_dir_config, limit_req_body), OR_ALL, TAKE1, "Limit (in bytes) on maximum size of request message body" }, + #ifndef NO_ADN_ADDON /* idle server shutdown */ + { "ExitAfterIdleTimeout", set_idle_timeout, NULL, RSRC_CONF, TAKE1, + "Maximum number of seconds a particular child server stays idle before dying. (0 => never)" }, + #endif /* NO_ADN_ADDON */ { NULL } }; *** apache_1.3.6/src/main/http_main.c Tue Jul 27 22:22:08 1999 --- apache_1.3.6-2.3.9/src/main/http_main.c Tue Jul 27 22:38:37 1999 *************** *** 253,258 **** --- 253,263 ---- API_VAR_EXPORT ap_ctx *ap_global_ctx; #endif /* EAPI */ + #ifndef NO_ADN_ADDON /* idle server shutdown */ + /* configured in http_config.c */ + int ap_exit_after_idle_timeout = 0; + #endif /* NO_ADN_ADDON */ + /* * The max child slot ever assigned, preserved across restarts. Necessary * to deal with MaxClients changes across SIGUSR1 restarts. We use this *************** *** 3714,3719 **** --- 3719,3731 ---- (void) ap_update_child_status(my_child_num, SERVER_READY, (request_rec *) NULL); + #ifndef NO_ADN_ADDON /* idle server shutdown */ + if (ap_exit_after_idle_timeout) { + signal(SIGALRM, (void (*)())just_die); + alarm(ap_exit_after_idle_timeout); + } + #endif /* NO_ADN_ADDON */ + /* * Wait for an acceptable connection to arrive. */ *************** *** 3760,3766 **** --- 3772,3796 ---- clen = sizeof(sa_client); csd = accept(sd, &sa_client, &clen); if (csd >= 0 || errno != EINTR) + #ifndef NO_ADN_ADDON /* accepted socket keepalive */ + /** code changes by AdNovum (sgw) 4/28/98 **/ + /** It seems like the KEEPALIVE on the listener socket **/ + /** is not properly propagated over an accept(). **/ + /** In order to fix that problem we set it again here. **/ + { + int one = 1; + if (setsockopt(csd, SOL_SOCKET,SO_KEEPALIVE, + (char *)&one,sizeof(int)) < 0) { + ap_log_unixerr("setsockopt", "(SO_KEEPALIVE)", NULL, + server_conf); + exit(1); + } + break; + /** end of AdNovum fix for KEEPALICE **/ + } + #else /* NO_ADN_ADDON */ break; + #endif /* NO_ADN_ADDON */ if (deferred_die) { /* we didn't get a socket, and we were told to die */ clean_child_exit(0); *************** *** 3847,3852 **** --- 3877,3887 ---- SAFE_ACCEPT(accept_mutex_off()); /* unlock after "accept" */ + #ifndef NO_ADN_ADDON /* idle server shutdown */ + /* reset any alarm signals */ + alarm(0); + #endif /* NO_ADN_ADDON */ + /* We've got a socket, let's at least process one request off the * socket before we accept a graceful restart request. */ *************** *** 4009,4014 **** --- 4044,4055 ---- } if (one_process) { + + #ifndef NO_ADN_ADDON /* debug support */ + /* added by AdNovum (tl) to prevent timeouts on open tcp sessions **/ + s->keep_alive = 0; + #endif /* NO_ADN_ADDON */ + signal(SIGHUP, just_die); signal(SIGINT, just_die); signal(SIGQUIT, SIG_DFL); >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, you need] [to include <[EMAIL PROTECTED]> in the Cc line and make sure the] [subject line starts with the report component and number, with ] [or without any 'Re:' prefixes (such as "general/1098:" or ] ["Re: general/1098:"). If the subject doesn't match this ] [pattern, your message will be misfiled and ignored. The ] ["apbugs" address is not added to the Cc line of messages from ] [the database automatically because of the potential for mail ] [loops. If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request from a ] [developer. Reply only with text; DO NOT SEND ATTACHMENTS! ]