On 10/05/2006 11:07 AM, Mathias Herberts wrote: > As I mentioned a few days ago here are my patches for mod_proxy. > > They modify mod_proxy behaviour in several ways to better fit our > production environment where we heavily use Apache and Tomcat. > > Just so you understand our context, we use a pool of Apache servers as > front-ends to Tomcat application servers. Each Apache server serves > several virtual hosts which connect (using AJP 1.3) to backend Tomcat > servers. > > The version of Apache we use is 2.2 and the version of Tomcat 5.5.
Thank you very much for submitting the patches. Some general remarks: 1. Please create your patches against trunk. All patches need to be applied to trunk first and can be backported later to the stable branch(es). There are only very rare conditions where the code in trunk has changed that much that a change cannot be applied to trunk first before it gets applied to a branch. This also helps you noticing what things have already been done. E.g. the "Retry a worker in error state before using the redirect worker" is already implemented in trunk. It is even already backported to the 2.2.x branch and will be part of 2.2.4. If you detect a change in trunk that has not been backported and that you consider useful for a stable branch you can ask for backporting it here on the list. 2. Please split the patches in a way that each patch contains only one logical change / feature / thing. E.g. the balancer.patch file contains two things: 1. Allow load factor 0 2. Retry a worker in error state before using the redirect worker This has the benefit that patches are easier to review and things that are undisputed can be applied sooner and independent from other things. 3. Please consider the code style guide (http://httpd.apache.org/dev/styleguide.html) when writing patches. Your patches violate this guide especially by using tabs instead of spaces and some other smaller things (some missing spaces, wrong indentations). 4. Don't feel discouraged by the comments above :-). Although these may look like some stupid things to torture new contributors in fact they are not. As the developers with commit access are all volunteers, time is one of the most valuable resources here. The priority with which developers review patches also depends heavily on the ease of a possible review and commit of this patch. As both sides (the project) and the contributor benefit from a contemporary review it makes sense to prepare this review from contributors side in a way that it is effective. Now some more specific comments: ajp.patch: 1. + else if (!strcasecmp(key,"forceclose")) { +»······/* Do not reuse backend connections */ + if (!strcasecmp(val, "on")) + worker->force_close = 1; + else if (!strcasecmp(val, "off")) + worker->sticky_force = 0; Guess that should be worker->force_close = 0 above. 2. +++ ./modules/proxy/mod_proxy.h»2006-09-13 17:58:45.000000000 +0200 @@ -289,6 +292,9 @@ * may be available while exceeding the soft limit */ apr_interval_time_t timeout; /* connection timeout */ char timeout_set; + apr_interval_time_t connect_timeout; /* connection timeout */ + char connect_timeout_set; + int »······»·······force_close; /* discard backend connection after use */ apr_interval_time_t acquire; /* acquire timeout when the maximum number of connections is exceeded */ char acquire_set; apr_size_t recv_buffer_size; @@ -323,6 +329,7 @@ apr_array_header_t *workers; /* array of proxy_workers */ const char *name; /* name of the load balancer */ const char *sticky; /* sticky session identifier */ + int»·······»·······sticky_case;»··· /* Ignore case when looking for sticky */ int sticky_force; /* Disable failover for sticky sessions */ apr_interval_time_t timeout; /* Timeout for waiting on free connection */ int max_attempts; /* Number of attempts before failing */ As already mentioned by someone else: If you add new members to a struct that is part of the public API, always add them to the end of this struct. Although this leads to a separation of related struct members over the time this is required for compatibility. Within a stable branch binary compatibilty is guaranteed. Changes that break this compatibility cannot be backported from trunk to a stable branch. There is a versioning scheme for this: The MODULE_MAGIC_NUMBER_MAJOR and the MODULE_MAGIC_NUMBER_MINOR which can be found in ap_mmn.h. Breaking binary compatibility requires a change of the MODULE_MAGIC_NUMBER_MAJOR (a major bump) whereas things that do not break binary compatibility but require third party sources that want to use an added feature of the API (like the new struct members) to require a certain minimum version of the product require only a change of the MODULE_MAGIC_NUMBER_MINOR (a minor bump). In the case of a major bump the MODULE_MAGIC_NUMBER_MINOR gets reseted. 3. --- ./modules/proxy/mod_proxy_ajp.c.orig»·······2006-09-18 22:01:53.000000000 +0200 +++ ./modules/proxy/mod_proxy_ajp.c»····2006-09-13 17:52:48.000000000 +0200 @@ -510,7 +510,7 @@ } · backend->is_ssl = 0; - backend->close_on_recycle = 0; + backend->close_on_recycle = worker->force_close; close_on_recycle is deprecated. Please use close instead. 4. Apart for the comments by others on the case insensitive session parameter I do not think that the session parameter itself should be set via stickycasesession. I think this should be a parameter that turns on/off case sensitive / case insensitive compares. 5. I see no need for the additional directive ProxyConnectTimeout. connecttimeout seems to be a worker property to me which should have a reasonable default if not set for a worker. 6. In the light of 5. this comment might look odd, but to be honest I do not get what you can do via connecttimeout. I try to analyse the situation when you have your Tomcat backend on a Linux (checked on a 2.4 kernel, may vary with 2.6 kernels) system: First I rule out two factors for long connection times: 1. A bad network. 2. A backend box that is that busy that the OS cannot process TCP packets in a timely manner. So what remains: If the backlog (however set, either OS default or via a dedicated parameter in a patched Tomcat) is full we have three possibilities: 1. The tcp syn backlog (set via /proc/sys/net/ipv4/tcp_max_syn_backlog) still has free capacities: In this case the OS still reponds to the SYN packages from the reverse proxy with SYN-ACK packages but silently ignores the answering ACK package from the reverse proxy. Nevertheless the reverse proxy thinks that it is connected, so the connect call is left. --> No long connection time 2. The tcp syn backlog is full. In this case it depends whether the OS has syn cookies enabled or not (/proc/sys/net/ipv4/tcp_syncookies). If yes we have the same situation reagarding the connection time as in one (no long connection time) in the other case the OS drops the SYN packages from the reverse proxy and we have a long connection time. Given the fact that tcp_max_syn_backlog can be only set globally and its default is rather larger (1024) I do not think that we get into this situation frequently. 3. /proc/sys/net/ipv4/tcp_abort_on_overflow is set. In this case connections that do not fit into the backlog of the socket have a RST package sent immediately after the ACK package from the client arrives. In this case we also have no long connection time. As the behaviour above may be not true for other OS'es except Linux connecttimeout might still make sense. Thus I wrote 5. :-). balancer.patch: 1. As mentioned before the "Retry a worker in error state before using the redirect worker" feature is already implemented. 2. I agree with Jim that using loadfactor 0 is a misuse of the loadfactor. As mentioned before in a different mail by me, we should define a state of a worker (whatever name it has, I don't want to go back into the disabled / stopped arguing :-)) that lets a worker only accept requests that match its route and nothing else. So what is the summary of all this above? Please resubmit your patches and we go into the next round of review. Regards Rüdiger
