[ ob-executive-summary (as this message somehow got far longer than I
  meant for it to be):

  ProxyPreserveHost is a great addition to the mod_proxy directive
  set, but a more general ability to set/manipulate arbitrary headers
  on proxy requests is also extremely useful. I have a patch that
  allows this, and which has been used in production (on several sites
  and with very heavy traffic) for several months, but it's for
  1.3.19. I will try to explain why this functionality is so important
  to our shop and attempt to convince other folks that the 1.3 tree
  would benefit from the inclusion of a "ProxyRequestHeader"
  directive.

  We're still deploying on my patched 1.3.19 because we rely so
  heavily on this ProxyRequestHeader functionality. I give
  configuration examples from our currently-in-roll-out project
  http://beta.democrats.org ]


Hi,

It's good to see the proxy-preserve-host patch being worked on. One of
our "typical" configurations involves reverse proxying through
multiple virtual hosts to a single backend server. It's often
important to know which virtual host the request came in for, so that
the backend server can adjust its content generation accordingly.

I'd like to give an example, because we use this kind of setup all the
time, and there's still a piece missing from the solution even with
the ProxyPreserveHost directive available...

At the moment, we're "live testing" http://beta.democrats.org/ (which
will, barring any unforeseen circumstances, become
http://www.democrats.org on Thursday night). The beta.democrats.org
domain is served by a pair of machines in a classic reverse-proxying
configuration -- the proxy server sits on the network fielding all
comers, doing lots of rewrite-rule stuff, and caching as aggressively
as possible. Another machine sits behind the proxy server doing all
the content generation.

Here's the first problem, which ProxyPreserveHost could solve quite
nicely:

In a few hours, we'll be bringing up spanish-language content on
es.beta.democrats.org. The new 'es' domain will be served in exactly
the same way, by the same two machines, and the backend server needs
to know whether its supposed to be thinking 'english' or 'spanish.'
Here's how we've been doing that, with our patched 1.3.19.

[ stripped down example for clarity ]

  ---------------------------------
    
    ProxyPassReverse      /            http://192.168.1.243/
    RewriteRule          ^/(.*)$       http://192.168.1.243/$1        [P] 

    NameVirtualHost 0.0.0.0
    # the en host should come first -- it's the default
    <VirtualHost 0.0.0.0>
      ServerName          beta.democrats.org
      CacheRoot           "/usr/local/apache/proxy-cache-en"
      RewriteEngine       On
      RewriteOptions      inherit
      ProxyRequestHeader  set   Req-Language   "en"
      ProxyRequestHeader  set   Req-Host       "beta.democrats.org"
    </VirtualHost>
    
    <VirtualHost 0.0.0.0>
      ServerName          es.beta.democrats.org
      CacheRoot           "/usr/local/apache/proxy-cache-es"
      RewriteEngine       On
      RewriteOptions      inherit
      ProxyRequestHeader  set   Req-Language   "es"
      ProxyRequestHeader  set   Req-Host       "es.beta.democrats.org"
    </VirtualHost>

  ---------------------------------

So we use a new configuration directive, ProxyRequestHeader, to pass
information to the back-end server.

Here's why we need to be able to set other, arbitrary headers (problem
two, which ProxyPreserveHost doesn't quite solve, at least not in a
generalizable way):

Both domains -- 'beta..' and 'beta.es..' -- need to serve SSL
connections as well as normal page requests. Again, the back-end
server does all the "real" work, but the proxy must accept the SSL
connection, handle the encryption-related issues, and reverse proxy
the request through to the backend as a normal request.

Here's how we do that:

[ again, stripped down and with SSL stuff elided ]
    
    NameVirtualHost 0.0.0.0:443
    # again, english comes first
    <VirtualHost 0.0.0.0:443>
     ServerName          beta.democrats.org
    
     [ mod_ssl stuff ]
    
     ProxyRequestHeader  set   Req-Language   "en"
     ProxyRequestHeader  set   Req-Host       "beta.democrats.org"
     ProxyRequestHeader  set   Req-HTTPS      "on"
     RewriteEngine       On
     RewriteOptions      inherit
    </VirtualHost>                                  
    
    <VirtualHost 0.0.0.0:443>
     ServerName          es.beta.democrats.org
    
     [ mod_ssl stuff ]
    
     ProxyRequestHeader  set   Req-Language   "es"
     ProxyRequestHeader  set   Req-Host       "es.beta.democrats.org"
     ProxyRequestHeader  set   Req-HTTPS      "on"
     RewriteEngine       On
     RewriteOptions      inherit
    </VirtualHost>        

  ---------------------------------

And this is a moderately simple example, mostly because we've just
built this site from scratch and it hasn't had time to grow various
appendages and mutate into a rewrite-rule-laden beast. On some of our
sites we have a mixture of virtual-host and directory-based proxy
configurations, which is why the ProxyRequestHeader directive takes an
optional extra final argument, a pattern to match against the URL.

So you can do something like this, for example:

  RewriteRule         ^/image/(.*)$   http://192.168.1.243/img/$1    [P] 
  ProxyRequestHeader  set             Req-Old-Image  "yes"     "^/image"

My patch that included the ProxyRequestHeader functionality was for
1.3.19. It also included some downstream cache control facilities (two
more configuration directives) and, for symmetry, a
ProxyResponseHeader directive that works the same way as
ProxyRequestHeader. But by far the most important of those features,
for us, is the ability to set arbritrary proxy request headers. The
other features are in the patch because we do use them occasionally
and they involved touching many of the same parts of the code. (It is
clear to me in retrospect that I should have submitted *separate*
patches for the various features -- I apologize for not doing that.)

(A full description of my 1.3.19 patch can be found at:
http://allafrica.com/tools/apache/mod_proxy/ )

I guess I have a couple of questions. Is there any chance that this
ProxyRequestHeader functionality could be deemed useful enough that it
would make it into the next 1.3 release? It's certainly important to
us -- we're still deploying all of our large sites on my patched
1.3.19 because we so often need to set these headers. I haven't ported
the patch to later versions partly because of a lack of time, and
partly because of an if-its-not-broken-don't-fix-it conservatism. If
there is a chance that ProxyRequestHeader (or something very like it)
could become part of the 1.3 tree, what can I do to help that happen?
I'm very happy to rewrite and resubmit the ProxyRequestHeader part of
the patch (and, of course, just as happy if someone who knows the
codebase better than I do would prefer to do so.)

If you made it this far, I feel obliged to thank you for taking
several hours out of your day to read my missive...

Kwindla

Reply via email to