Author: rjung Date: Thu Mar 5 01:33:59 2009 New Revision: 750276 URL: http://svn.apache.org/viewvc?rev=750276&view=rev Log: Adding a new documentation page about the special situation of a reverse proxy. Needs some checking and proof reading.
Added: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml (with props) Added: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml URL: http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml?rev=750276&view=auto ============================================================================== --- tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml (added) +++ tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml Thu Mar 5 01:33:59 2009 @@ -0,0 +1,303 @@ +<?xml version="1.0" encoding="ISO-8859-1"?> +<!DOCTYPE document [ + <!ENTITY project SYSTEM "project.xml"> +]> +<document url="proxy.html"> + + &project; +<copyright> + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +</copyright> +<properties> +<title>Reverse Proxy HowTo</title> +<author email="rj...@apache.org">Rainer Jung</author> +<date>$Date$</date> +</properties> +<body> +<section name="Introduction"> +<br/> +<p>The Apache module mod_jk and its ISAPI and NSAPI variants connect +a web server to a backend (typically Tomcat) using the AJP protocol. +The web server receives an HTTP(S) request and the module forwards +the request to the back end. This situation is usually called a gateway +or a proxy, in the context of HTTP it is called a reverse proxy. +</p> +</section> +<section name="Typical problems"> +<br/> +<p>A reverse proxy is not totally transparent to the application on +the backend. For instance the host name and port the original client +(e.g. browser) needs to talk to belong to the web server and not to the +backend, so the reverse proxy talks to a different host name and port. +When the application on the backend returns content including +self-referencing URLs using its own backend address and port, the +client will usually not be able to use these URLs. +</p> +<br/> +<p>Another example is the client IP address, which for the web server is the +source IP of the incoming connection, whereas for the backend the +connection always comes from the web server. This can be a problem, when +th client IP is checked by the backend application e.g. for security reasons. +</p> +</section> +<section name="AJP as a solution"> +<br/> +<p>Most of these problems are automatically handled by the AJP protocol +and the AJP connectors of the backend. The AJP protocol transports +this communication metadata and the backend connector presents this +metadata whenever the appliaction asks for it using Servlet API methods. +</p> +<p>The following list contains the communication metadata handled by AJP +and the HttpServletRequest API calls which can be used to retrieve them: +<ul> +<li>local name: <code>getLocalName()</code> and <code>getLocalAddr</code>. +This is also equal to getServerName(), unless a <code>Host</code> header +is contained in the request. in this case server name is taken from that header. +</li> +<li>local port: <code>getLocalPort()</code> +This is also equal to getServerPort(), unless a <code>Host</code> header +is contained in the request. in this case server port is taken from that header +if it contains an explicit port, or is equal to the default port of the scheme used. +</li> +<li>client address: <code>getRemoteAddr()</code> +</li> +<li>client host: <code>getRemoteHost()</code> +</li> +<li>authentication type: <code>getAuthType()</code> +</li> +<li>remote user: <code>getRemoteUser()</code>, +if <code>tomcatAuthentication="false"</code> +</li> +<li>protocol: <code>getProtocol()</code> +</li> +<li>HTTP method: <code>getMethod()</code> +</li> +<li>URI: <code>getRequestURI()</code> +</li> +<li>HTTPS used: <code>isSecure()</code>, <code>getScheme()</code> +</li> +<li>query string: <code>getQueryString()</code> +</li> +<li>SSL cipher: <code>getAttribute(javax.servlet.request.cipher_suite)</code> +</li> +<li>SSL key size: <code>getAttribute(javax.servlet.request.key_size)</code> +</li> +<li>SSL client certificate: <code>getAttribute(javax.servlet.request.X509Certificate)</code> +</li> +<li>SSL session ID: <code>getAttribute(javax.servlet.request.ssl_session)</code>. +This is for Tomcat, it has not yet been standardized. +</li> +</ul> +</p> +</section> +<section name="Fine tuning"> +<br/> +<p>In some situations this is not enough though. Assume there is another +less clever reverse proxy in front of your web server, for instance an +HTTP load balancer or similar device which also serves as an SSL accelerator. +</p> +<p>Then you are sure, all your clients use HTTPS, but your web server doesn't +know about that. All it can see is requests coming from the accelerator using +plain HTTP. +</p> +<p>Another example would be a simple reverse proxy in front of your web server, +so that the client IP address your web server detects is always the IP address +of this reverse proxy, and not of the original client. Often such reverse proxies +generate an additional HTTP header, like <code>X-Forwareded-for</code> which +contains the original client IP address (or a list of IP addresses, if there are +more cascading reverse proxies in front). It would be nice, if we could use the +content of such a header as the client IP adrress to pass to the backend. +</p> +<p>So we might need to manipulate some of the data that AJP sends to the backend. +When using mod_jk inside Apache httpd you can use several httpd environment +variables to let mod_jk know, which data it should forward. These environment +be set by the httpd directives SetEnv or SetEnvIf, but also in very flexible +was using mod_rewrite (since httpd 2.x it can not only test against environment +variables, but also set them). +</p> +<p>The following list contains all environment variables mod_jk checks, before +sending data to the backend: +<ul> +<li>JK_LOCAL_NAME: the local name +</li> +<li>JK_LOCAL_PORT: the local port +</li> +<li>JK_REMOTE_HOST: the client host XXX ?? +</li> +<li>JK_REMOTE_ADDR: the client address +</li> +<li>JK_AUTH_TYPE: the authentication type +</li> +<li>JK_REMOTE_USER: the remote user +</li> +<li>HTTPS: On (case-insensitive) to indicate, that HTTPS is used +</li> +<li>SSL_CIPHER: the SSL cipher +</li> +<li>SSL_CIPHER_USEKEYSIZE: the SSL key size +</li> +<li>SSL_CLIENT_CERT: the SSL client certificate +</li> +<li>SSL_CLIENT_CERT_CHAIN_: prefix of variable names, containing +the client cerificate chain +</li> +<li>SSL_SESSION_ID: the SSL session ID +</li> +</ul> +</p> +<p>Remember: in general you don't need to set them. The module retrieves the data automatically +from the web server. Only in case you want to change this data, you can overwrite it by +using these variables. +</p> +<p>Some of these variables might also be used by other web server modules. All +variables whose name does not begin with "JK" are set directly by Apache httpd. +If you want to change the data, but do not want to negatively influence the behaviour +of other modules, you can change the names of all variables mod_jk uses. For the +details see the <a href="../reference/apache.html">Apache reference</a> page. +</p> +<p>All variables, that are not SSL-related have only been introduced in version 1.2.27. +</p> +</section> +<section name="URL handling"> +<br/> +<subsection name="URL rewriting"> +<p>Sometimes one want to change path components of the URLs under which an application +is available. Especially if a web application is deployed as some context, say <code>/myapp</code>, +marketing prefers short URLs, so want the application to be directly available under +<code>http://www.mycompany.com/</code>. Although you can deploy the application as the so-called +ROOT context, which will be directly available at "/", admins often prefer not to use +the ROT context, e.g. because only one application can be the root context (per host). +</p> +<p>The procedure to change the URLs in the reverse proxy is tedious, because often +an application produces self-referencing URLs, which then include the path components, +that you tried to hide to the outside world. Nevertheless, if you absolutely need to do it, +here are the steps. +</p> +<p>Case A: You need to make the application available at a simple URL, but it is OK, if +users proceed using the more complex URLs, as long as they don't have to type them in. +That's the easy case, and if this suffices to you, you're lucky. Use a simply RedirectMatch +for Apache httpd: +</p> +<source> +RedirectMatch ^/$ http://www.mycompany.com/myapp/ +</source> +<p>Your application will then be available under <code>http://www.mycompany.com/</code>, +and each visitor will be immediately redirected to the real URL +<code>http://www.mycompany.com/myapp/</code> +</p> +<p>Case B: You need to hide path components for all requests going to the application. +Here's the recipe for the case, where you want to hide the first path component +<code>/myapp</code>. More complex manipulations are left as an exercise to the reader. +First the solution for the case of Apache httpd: +</p> +<p>1. Use <a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html"><code>mod_rewrite</code></a> +to add <code>/myapp</code> to all requests before forwarding to the backend: +</p> +<source> +# Don't forget the PT flag! (pass through) +RewriteRule ^/(.*) http://www.mycompany.com/myapp/$1 [PT] +</source> +<p>2. Use <a href="http://httpd.apache.org/docs/2.2/mod/mod_headers.html"><code>mod_headers</code></a> +to rewrite any HTTP redirects your application might return. Such redirects typically contain +the path components you want to hide, because by the HTTP standard, redirects always need to include +the full URL, and your application is not aware of the fact, that your clients talk to it via +some shortened URL. An HTTP redirect is done with a special header named <code>Location</code>. +We rewrite the Location headers of our responses: +</p> +<source> +# Keep protocol, server and port if present, +# but insert our webapp name before the rest of the URL +Header edit Location ^([^/]*//[^/]*)?/(.*)$ $1/myapp/$2 +</source> +<p>3. Use <code>mod_headers</code> again, to rewrite the pathes contained in any cookies, +your application might set. Such cookie pathes again might contain +the path components you want to hide. +A cookie is set with the HTTP header named <code>Set-Cookie</code>. +We rewrite the Set-Cookie headers of our responses: +</p> +<source> +# Fix the cookie path +Header edit Set-Cookie "^(.*; Path=/)(.*)" $1/myapp/$2 +</source> +<p>3. Some applications might contain hard coded absolute links. +In this case check, whether you find a configuration item for your web framework +to configure the base URL. If not, your only chance is to parse all response +content bodies and do search and replace. This is fragile and very resource intensive. +If you really need to do this, you can use +<a href="http://apache.webthing.com/mod_proxy_html/"><code>mod_proxy_html</code></a>, +<a href="http://httpd.apache.org/docs/2.2/mod/mod_substitute.html"><code>mod_substitute</code></a> +or <a href="http://blogs.sun.com/basant/entry/using_mod_sed_to_filter"><code>mod_sed</code></a> +for this task. +</p> +<p>If you are using Microsoft IIS as a web server, the ISAPI plugin provides a way +of doing the first step with a builtin feature. You define a mapping file for simple prefix +changes like this: +</p> +<source> +# Add a context prefix ... +/=/myapp/ +# ... or change some prefix ... +/oldapp/=/myapp/ +</source> +<p>and then put the name of the file in the <code>rewrite_rule_file</code> entry of the registry or your +<code>isapi_redirect.properties</code> file. In you <code>uriworkermap.properties</code> file, you +still need to map the URLs as they are before rewriting! +</p> +<p>More complex rewrites can be done using the same file, but with regular expressions. A leading +tilde sign '<code>~</code>', indicates, that you are using a regular expression: +</p> +<source> +# Use a regular expression rewrite +~/oldapps([0-9]*)/=/newapps$1/ +</source> +<p>There is no support for Steps 2 (rewriting redirect responses) or 3. (rewriting cokie paths). +</p> +</subsection> +<subsection name="URL encoding"> +<p>Some type of problem is triggered by the use of encoded URLs +(see <a href="http://en.wikipedia.org/wiki/Percent-encoding">percent encoding</a>). +For the same location there exist +a lot of different URLs which are equivalent. The reverse proxy needs to inspect the URL in order +to apply its own authentication rules and to decide, to which backend it should send the request +(or whether it should handle it itself). Therefore the request URL first is normalized: +percent encoded characters are decoded, <code>/./</code> is replaced by <code>/</code>, +<code>/XXX/../</code> is replaced by <code>/</code> and similar manipulations of the URL are done. +After that, the web server might apply rewrite rules to further change the URL in less obvious ways. +Finally there is no more way to put the resulting URL in an encoding, which is "similar" to +the one which was used for the original URL. +</p> +<p> +For historical reasons, there have been several alternatives, how mod_jk and the ISAPI +plugin encoded the resulting URL before sending it to the backend. They could be chosen via +<code>JkOptions</code> (Apache httpd) or <code>uri_select</code> (ISAPI). All of those historical +encodings are not recommended, because they have either negative functionality implications or +pose a security risk. The default encoding since version 1.2.24 is <code>ForwardURIProxy</code> +(Apache httpd) or <code>proxy</code> (ISAPI) and it is strongly recommended to keep the default. +</p> +</subsection> +</section> +<section name="request attributes"> +<br/> +<p> +You can also add more attributes to any request you are forwarding when using Apache httpd. +For this use the <code>JkEnvVar</code> directive (for details see the +<a href="../reference/apache.html">Apache reference</a> page). Such request attributes can be +retrieved on the Tomcat side via request.getAttribute(attributeName). +Note that their names will not be listed in request.getAttributeNames()! +</p> +</section> +</body> +</document> Propchange: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml ------------------------------------------------------------------------------ svn:eol-style = native Propchange: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml ------------------------------------------------------------------------------ svn:keywords = Author Date Id Revision --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org