After few discussions had face-to-face with some of you (Stefano on the phone ranting about setting up Tomcat, Jeremy over lunch at my place few weeks ago, and several others), and few odd questions popping out on the list, I feel the need to tell you why my vision is so narrow when someone touches the "Apache" argument.
As I said several times in the past 6 years, I've learnt how to use Apache (1.3 first, and 2.0 lately) to suit my needs and I would never envision an HTTP server running without it. Given my "pragmatical" vision, it's hard to explain "why" I am so biased, and probably the best way to come out-of-the-loophole is to share the few things I learnt, and that make my everyday life of administrator easy... So, those are few tips for those of you who wonder about my rants. (I should really post those to the Wiki, but dammit, I don't know how to use it :-) Why Apache as a front end? -------------------------- Probably the first and most important question to answer is WHY it is so important to have Apache HTTPd as a front-end for a website. I believe that for anyone, there's nothing more annoying than hitting a web page, waiting for a few seconds, and then seeing our favorite browser coming up with "The connection was refused when attempting to contact http://www.domain.tld/". In my opinion (and my boss') it is unacceptable to have a "downtime" on a website, and if that happens, whoever connects needs to know what's going on, or, at least, we need to tell him something: "We are sorry, but currently http://www.domain.tld/ is unavailable because of essential system upgrades. We expect to resume all our services in less than 10 minutes. Please, check back later" sounds so much better (maybe with our little nice logo, and yada yada, yada). When once I asked to Brian Behlendorf why Apache was doing some oddities in the code, he responded "Call it defensive programming": this explains the entire vision behind Apache: Apache, no matter what, can _not_ "go down" and not respond to HTTP requests. This is the essence behind it and its design is centered around this idea, so, in my opinion (and experience) it is that one option allowing us to achieve our goal of "zero port 80 downtime". Apache's design enforces a multi-process model: there is always a minimal wrapper bound to port 80 (as safe and minimal as possible), spawning new OS processes per request doing the work. This allows that even in the worst case scenario (a segmentation violation in the code that dumps the entire OS process), something will be sent back to the client. A Java-based web server can not achieve this. Java is a single-process environment and if something happens to it, it will just exit, unbinding port 80 and leaving our clients with "connection refused". There is another issue, important one, about security. Java does not support switching user-ID after it's started, and under UNIX operating systems, everyone knows that noone apart from "root" can bind to ports < 1024. In our case it is a problem, I either decide to run my service as root (and that is NOT a good idea), or I bind to some port > 1024 (usually 8080). But then, the complexity arises when forwarding requests for port 80 (our usual HTTP service) to a port above 1024 (8080). Either firewall packages, or port remappers, any of those solution involves a some-degree of complexity. Apache avoids all that. Being native, it can bind to ports < 1024 and run as a non-privileged user, allowing us to run our servlet container (as well) as a non privileged user. But those are not the only advantages, Apache helps us in much much better ways, and I hope, at this point to be able to show you what and how... What Apache? How Apache? ------------------------ A very personal choice is what version of Apache you want to run. In my following examples I will assume you're going to use Apache 2.0, as it is now _stable_ and much more performing than the "old" 1.3. It's now several months that most of the sites hosted by VNU (my employer) are running 2.0 (apart from our old legacy "rolaren" server) and I never had in my personal experience a single problem. Apache 2.0, though, is somehow more "difficult" to build and configure: the most difficult choice is the selection of the MPM (Multi-Process Module) to use. Read the manual to choose what suits you best, but in my case the "worker" MPM (multi-process, multi-threaded) is the one giving me the best performance/solidity ratio. The "www.apache.org" website, on the other hand, uses the "prefork" MPM (multi-process, single threaded, exactly as Apache 1.3 did), but I feel that under certain operating system it is slightly slower than "worker". Your choice. As a reference, I configure Apache 2.0 in the following way: ./configure \ --with-mpm=worker \ --enable-modules=all \ --enable-mods-shared=all \ --enable-proxy \ --enable-proxy-http \ --disable-ipv6 Basically, I use the "worker" module, all modules are compiled as DSO modules (dynamically loaded, so that I can disable the ones I don't use), including the proxy/proxy-http module, and I don't care for IPv6 support. Connecting Cocoon ----------------- As Stefano, I had several headaches trying to connect Apache and [name your Servlet container of choice]. Mod_JK (JK2) doesn't work for me, mod_webapp works for me, but just for me because I'm the author, and was forced to sadly abandon its development, the only solution I see (and the one which works best for me currently) is mod_proxy. Mod_proxy is a nice little module, especially in Apache 2.0 where its caching part is completely decoupled in another module (mod_cache), it's very small, lightweight, and does the job... Plus, you have the advantage to choose whatever servlet container you have in the backend: Orion, WebSphere, Tomcat, Jetty, you name it, it supports HTTP :-) (well, apart from ServletExec, but that's another story, and if someone wants some hints, let me know). Connecting Cocoon is _simple_: all you have to do is configure your servlet container to run on a high port (8080 for example) and make sure it runs as a non privileged user, make sure that it knows that is a proxied-HTTP server (Cocoon, Jetty, Resin, Orion, ... They all have this concept, check out the documentation), and configure Apache with those two lines: ProxyPass / http://localhost:8080/ ProxyPassReverse / http://localhost:8080/ The first one tells Apache that any whatsoever request (from / onwards) gets "proxied" to localhost:8080, and the second one tells Apache to make sure that any "Location" HTTP header coming back gets rewritten accordingly (just in case if your Servlet container doesn't let you set the "proxied" configuration). That's _IT_. It runs, and it runs smoothly. Trivially serving static files ------------------------------ Now, Apache is _definitely_ faster than any Java based servlet container in serving files straight to HTTP clients. This is just because nowadays it uses a kernel-based function called "sendfile", that makes its performances far greater than anything than Java can do. Using mod_proxy and the set of ProxyPass configuration directive doesn't allow us to set a "pattern" to associate to resources to be served straight off the filesystem, it only allows us to define exclusion lists and processing lists. In my example, then I will rewrite my configuration to make Apache serve everything beginning with "/static/" straight out of my web-application, without even touching the servlet container: # Make sure that my document root points to the root of the web # application (where the WEB-INF is located, for instance). DocumentRoot /export/webapps/cocoon # We don't proxy any request beginning with the keyword "/static/". # So, for example, "/static/logo.gif" will be served directly by # Apache from the "/export/webapps/cocoon/static/logo.gif file" ProxyPass /static/ ! # Another one for "favicon.ico", so that explorer and mozilla are happy ProxyPass /favicon.ico ! # And now we send back to the servlet engine everyting else that does # not begin with "/static/" or "/favicon.ico" ProxyPass / http://localhost:8080/ ProxyPassReverse / http://localhost:8080/ Simple, the "!" keyword in ProxyPass means "don't" :-) The holding page ---------------- If you used one of the configurations above, you'll see that if your servlet container is not respondong on port 8080 for any reason, you will get a nice "Bad Gateway" error page (HTTP 502 Error). As that page is quite ugly (I have to admit that the HTTPd freaks are not good HTML artists), you might want to point your clients to a better-designed page (or containing some lame excuse on why your servlet container is down). You can do that easily (again), by using the ErrorDocument directive. Note that, though, the ErrorDocument directive requires a file (so it needs to be non proxied). Either you get down nasty with your mod_alias configurations, or simply, use the second configuration and include it in your webapp as a static file. Anyway, what you have to specify in that case is simply: # If mod_proxy cannot connect to the servlet container, we want # to display a nice static page saying the reason ErrorDocument 502 /static/unavailable.html If (for example) you wanted to use Server-Side-Includes to render your page (it might be nice to display something like the host name, or the time when the request was received, you can do so by using SHTML files. This is what I use at home: <html> <head> <title><!--#echo var="SERVER_NAME"-->: server off-line</title> </head> <body> <h3><!--#echo var="SERVER_NAME"-->: server off-line</h3> <p> We are sorry, but the server is temporarily unavailable due to maintenance. Our team is working to restore service as soon as possible.<br /> In case of troubles, please feel free to contact our webmaster sending an email to <a href="mailto:<!--#echo var="SERVER_ADMIN"-->"> <<!--#echo var="SERVER_ADMIN"-->> </a>. </p> <hr/> <p> <small> <!--#echo var="SERVER_SOFTWARE"--> running on <!--#echo var="SERVER_NAME"-->:<!--#echo var="SERVER_PORT"--> at <!--#echo var="DATE_LOCAL"-->. </small> </p> </body> </html> And to make it work properly this is how your httpd.conf will have to look like: # Make sure that Server Side Includes are processed and sent # to the client with mime-type as text/html AddType text/html .shtml AddOutputFilter Includes .shtml # Make sure that our SHTMLs are processed in the static # directory <Directory "/export/webapps/cocoon"> Options IncludesNoExec </Directory> # If mod_proxy cannot connect to the servlet container, we want # to display a nice static page saying the reason. This is a # SHTML page (using the Server-Side-Includes filter) ErrorDocument 502 /static/unavailable.shtml Putting mod_proxy all together in one ------------------------------------- Ok, now that we have seen how each piece gets together, let's try to put them all together, adding also that any request to "/WEB-INF/" should be forbidden straight away (there's no point in proxying them when we know that the servlet container will block them all) # Make sure that my document root points to the root of the web # application (where the WEB-INF is located, for instance). DocumentRoot /export/webapps/cocoon # Make sure that Server Side Includes are processed and sent # to the client with mime-type as text/html AddType text/html .shtml AddOutputFilter Includes .shtml # Make sure that our SHTMLs are processed in the static # directory <Directory "/export/webapps/cocoon"> Options +IncludesNoExec </Directory> # Block the stupid "WEB-INF" pseudo-url (god I wish web-applications # were designed with some intelligence... Ok, my fault as well) <Location /WEB-INF> Order deny,allow Deny from all </Location> # If mod_proxy cannot connect to the servlet container, we want # to display a nice static page saying the reason. This is a # SHTML page (using the Server-Side-Includes filter) ErrorDocument 502 /static/unavailable.shtml # We don't proxy any request beginning with the keyword "/static/". # So, for example, "/static/logo.gif" will be served directly by # Apache from the "/export/webapps/cocoon/static/logo.gif file" ProxyPass /static/ ! # Another one for "favicon.ico", so that explorer and mozilla are happy ProxyPass /favicon.ico ! # And now we send back to the servlet engine everyting else that does # not begin with "/static/" or "/favicon.ico" ProxyPass / http://localhost:8080/ ProxyPassReverse / http://localhost:8080/ Simple, easy, beautiful... A more complex example: mod_rewrite ----------------------------------- This is all nice and clean, but if we want to be really nasty, and starting to serve (for example) all our GIF and JPG files straight via Apache, we would need to use mod_rewrite. I know, mod_rewrite is ugly, it uses PERL regular expressions (so, well, it's even slightly slower), but mod_proxy is way to crummy, it's either "in" or "out", and it takes over the whole world (you can't really do much else after you said you're going to forward a URL). So, mod_rewrite, even if it's ugly, even if it's slower, _is_ our solution. With a couple of rules, we can take the configuration written above to the extreme, and basically do WHATEVER we want with a URL _before_ it even knows about a possible servlet container in the backend. I suggest you to read _carefully_ the mod_rewrite documentation, but, as a start, I'm going to rewrite what's written above, using rewrite and its flags, from here on, you're on your own :-) :-) # Make sure that my document root points to the root of the web # application (where the WEB-INF is located, for instance). DocumentRoot /export/webapps/cocoon # Make sure that Server Side Includes are processed and sent # to the client with mime-type as text/html AddType text/html .shtml AddOutputFilter Includes .shtml # Make sure that our SHTMLs are processed in the static # directory <Directory "/export/webapps/cocoon"> Options +IncludesNoExec </Directory> # If mod_proxy cannot connect to the servlet container, we want # to display a nice static page saying the reason. This is a # SHTML page (using the Server-Side-Includes filter) ErrorDocument 502 /static/unavailable.shtml # The nastiness begins, let's fire up the "rewrite engine" RewriteEngine On # Everything that starts with "/static" or "/static/" is served straight # through: no redirection, no proxying, no nothing, and the [L] flag # implies that if this rule is matched, no other matching must be # performed RewriteRule "^/static/?(.*)" "$0" [L] # Everything that starts with a NON-CASE-SENSITIVE match (the NC flag) # of "/WEB-INF" or "/WEB-INF/" is forbidden (the F flag). And again, # this is the last rule (the L flag), nothing will be processed by the # rewrite engine if this rule is matched RewriteRule "^/WEB-INF/?(.*)" "$0" [L,F,NC] # Everything ending in ".gif", ".jpg" or ".jpeg" will be served again # directly by Apache, no need to bother the servlet container. As above # this is the last rule as specified by the [L] flag at the end RewriteRule "^/(.*)\.gif$" "$0" [L] RewriteRule "^/(.*)\.(jpg|jpeg)$" "$0" [L] # Everything else not matched above needs to go to the servlet container # via HTTP listening on port 8080. The [P] flag (which is required) # implies that our requests will be handled by mod_proxy. RewriteRule "^/(.*)" "http://localhost:8080/$1" [P] # Make sure that if the servlet container specifies a "Location" HTTP # header during redirection starting with "http://localhost:8080/", we # can handle it and return to our client the effective (not real) # location we want to redirect them to. This is _essential_. ProxyPassReverse / http://localhost:8080/ As I mentioned before, ugly, but _really_ effective. In few lines we connect the HTTP-based servlet container running Cocoon to Apache, we make sure that if the servlet container falls over, we direct people to an appropriate holding page, we serve all that is under /static, all GIF and all JPEG files straight off without touching Cocoon and all the rest through our sitemap, and as a free bonus, everything that ends in ".shtml" (from disk or from the sitemap) will be passed through the Apache "Server-Side-Includes" filter (mod_include, which is ugly, but sometimes _really_ effective)... Conclusions ----------- I hope to have cleared some of the doubts on Apache, and why I love it so much... It is a hub, a hub embracing your website and making it work better, faster, more reliably and exactly fine-tuned precisely as you (or your boss) like it. And you can trust Apache, I believe that our spirit, the spirit of the entire Cocoon community is built on top on the original HTTPd vision of let's make things work so nicely that the world won't have to look for another solution... HTTPd does it in its little piece of being an HTTP hub, Jetty does it in its little piece of being a servlet container, Cocoon does it in its little piece of being the best "web-application" framework available on the planet right now. Together, those three little pieces _will_ conquer the world. Have fun... Pier (BTW, where the hell is Tomcat in this picture? :-) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]