Re: mod_proxy and balancer problems

Ruediger Pluem Sat, 07 Oct 2006 13:11:27 -0700


On 10/05/2006 11:07 AM, Mathias Herberts wrote:
> As I mentioned a few days ago here are my patches for mod_proxy.
> 
> They modify mod_proxy behaviour in several ways to better fit our
> production environment where we heavily use Apache and Tomcat.
> 
> Just so you understand our context, we use a pool of Apache servers as
> front-ends to Tomcat application servers. Each Apache server serves
> several virtual hosts which connect (using AJP 1.3) to backend Tomcat
> servers.
> 
> The version of Apache we use is 2.2 and the version of Tomcat 5.5.


Thank you very much for submitting the patches. Some general remarks:

1. Please create your patches against trunk. All patches need to be applied
   to trunk first and can be backported later to the stable branch(es).
   There are only very rare conditions where the code in trunk has changed
   that much that a change cannot be applied to trunk first before it
   gets applied to a branch.
   This also helps you noticing what things have already been done. E.g.
   the "Retry a worker in error state before using the redirect worker" is
   already implemented in trunk. It is even already backported to the 2.2.x
   branch and will be part of 2.2.4.
   If you detect a change in trunk that has not been backported and that you
   consider useful for a stable branch you can ask for backporting it here on
   the list.

2. Please split the patches in a way that each patch contains only one logical
   change / feature / thing. E.g. the balancer.patch file contains two things:

   1. Allow load factor 0
   2. Retry a worker in error state before using the redirect worker

   This has the benefit that patches are easier to review and things that are
   undisputed can be applied sooner and independent from other things.

3. Please consider the code style guide 
(http://httpd.apache.org/dev/styleguide.html)
   when writing patches. Your patches violate this guide especially by using 
tabs
   instead of spaces and some other smaller things (some missing spaces, wrong 
indentations).

4. Don't feel discouraged by the comments above :-). Although these may look 
like some
   stupid things to torture new contributors in fact they are not. As the 
developers with
   commit access are all volunteers, time is one of the most valuable resources 
here.
   The priority with which developers review patches also depends heavily on 
the ease of
   a possible review and commit of this patch. As both sides (the project) and 
the contributor
   benefit from a contemporary review it makes sense to prepare this review 
from contributors
   side in a way that it is effective.

Now some more specific comments:

ajp.patch:

1.

+    else if (!strcasecmp(key,"forceclose")) {
+»······/* Do not reuse backend connections */
+       if (!strcasecmp(val, "on"))
+           worker->force_close = 1;
+       else if (!strcasecmp(val, "off"))
+           worker->sticky_force = 0;

Guess that should be worker->force_close = 0 above.

2.

+++ ./modules/proxy/mod_proxy.h»2006-09-13 17:58:45.000000000 +0200

@@ -289,6 +292,9 @@
                                  * may be available while exceeding the soft 
limit */
     apr_interval_time_t timeout; /* connection timeout */
     char                timeout_set;
+    apr_interval_time_t connect_timeout; /* connection timeout */
+    char                connect_timeout_set;
+    int »······»·······force_close; /* discard backend connection after use */
     apr_interval_time_t acquire; /* acquire timeout when the maximum number of 
connections is exceeded */
     char                acquire_set;
     apr_size_t          recv_buffer_size;
@@ -323,6 +329,7 @@
     apr_array_header_t *workers; /* array of proxy_workers */
     const char *name;            /* name of the load balancer */
     const char *sticky;          /* sticky session identifier */
+    int»·······»·······sticky_case;»··· /* Ignore case when looking for sticky 
*/
     int         sticky_force;    /* Disable failover for sticky sessions */
     apr_interval_time_t timeout; /* Timeout for waiting on free connection */
     int                 max_attempts; /* Number of attempts before failing */


As already mentioned by someone else: If you add new members to a struct that 
is part of
the public API, always add them to the end of this struct. Although this leads 
to a
separation of related struct members over the time this is required for 
compatibility.
Within a stable branch binary compatibilty is guaranteed. Changes that break 
this compatibility
cannot be backported from trunk to a stable branch.
There is a versioning scheme for this: The MODULE_MAGIC_NUMBER_MAJOR and the
MODULE_MAGIC_NUMBER_MINOR which can be found in ap_mmn.h. Breaking binary 
compatibility requires
a change of the MODULE_MAGIC_NUMBER_MAJOR (a major bump) whereas things that do 
not break binary
compatibility but require third party sources that want to use an added feature 
of the API
(like the new struct members) to require a certain minimum version of the 
product require only
a change of the MODULE_MAGIC_NUMBER_MINOR (a minor bump). In the case of a 
major bump the
MODULE_MAGIC_NUMBER_MINOR gets reseted.

3.

--- ./modules/proxy/mod_proxy_ajp.c.orig»·······2006-09-18 22:01:53.000000000 
+0200
+++ ./modules/proxy/mod_proxy_ajp.c»····2006-09-13 17:52:48.000000000 +0200
@@ -510,7 +510,7 @@
     }
·
     backend->is_ssl = 0;
-    backend->close_on_recycle = 0;
+    backend->close_on_recycle = worker->force_close;

close_on_recycle is deprecated. Please use close instead.

4. Apart for the comments by others on the case insensitive session parameter I 
do not
   think that the session parameter itself should be set via stickycasesession.
   I think this should be a parameter that turns on/off case sensitive / case 
insensitive
   compares.

5. I see no need for the additional directive ProxyConnectTimeout.
   connecttimeout seems to be a worker property to me which should have a 
reasonable default
   if not set for a worker.

6. In the light of 5. this comment might look odd, but to be honest I do not 
get what
   you can do via connecttimeout. I try to analyse the situation when you have 
your Tomcat
   backend on a Linux (checked on a 2.4 kernel, may vary with 2.6 kernels) 
system:

   First I rule out two factors for long connection times:

   1. A bad network.
   2. A backend box that is that busy that the OS cannot process TCP packets in 
a timely manner.

   So what remains:

   If the backlog (however set, either OS default or via a dedicated parameter 
in a patched Tomcat)
   is full we have three possibilities:

   1. The tcp syn backlog (set via /proc/sys/net/ipv4/tcp_max_syn_backlog) 
still has free capacities:
      In this case the OS still reponds to the SYN packages from the reverse 
proxy with SYN-ACK packages
      but silently ignores the answering ACK package from the reverse proxy. 
Nevertheless the reverse
      proxy thinks that it is connected, so the connect call is left. --> No 
long connection time

   2. The tcp syn backlog is full. In this case it depends whether the OS has 
syn cookies enabled or
      not (/proc/sys/net/ipv4/tcp_syncookies). If yes we have the same 
situation reagarding the connection
      time as in one (no long connection time) in the other case the OS drops 
the SYN packages from the
      reverse proxy and we have a long connection time. Given the fact that 
tcp_max_syn_backlog can be only
      set globally and its default is rather larger (1024) I do not think that 
we get into this situation
      frequently.

   3. /proc/sys/net/ipv4/tcp_abort_on_overflow is set. In this case connections 
that do not fit into the
      backlog of the socket have a RST package sent immediately after the ACK 
package from the client arrives.
      In this case we also have no long connection time.

   As the behaviour above may be not true for other OS'es except Linux 
connecttimeout might still make sense.
   Thus I wrote 5. :-).


balancer.patch:

1. As mentioned before the "Retry a worker in error state before using the 
redirect worker" feature
   is already implemented.

2. I agree with Jim that using loadfactor 0 is a misuse of the loadfactor. As 
mentioned before
   in a different mail by me, we should define a state of a worker (whatever 
name it has,
   I don't want to go back into the disabled / stopped arguing :-)) that lets a 
worker only accept
   requests that match its route and nothing else.

So what is the summary of all this above? Please resubmit your patches and we 
go into the next round
of review.

Regards

Rüdiger

Re: mod_proxy and balancer problems

Reply via email to