Author: rjung
Date: Thu Mar  5 01:33:59 2009
New Revision: 750276

URL: http://svn.apache.org/viewvc?rev=750276&view=rev
Log:
Adding a new documentation page about the special
situation of a reverse proxy.
Needs some checking and proof reading.

Added:
    tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml   (with props)

Added: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml
URL: 
http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml?rev=750276&view=auto
==============================================================================
--- tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml (added)
+++ tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml Thu Mar  5 
01:33:59 2009
@@ -0,0 +1,303 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<!DOCTYPE document [
+  <!ENTITY project SYSTEM "project.xml">
+]>
+<document url="proxy.html">
+
+  &project;
+<copyright>
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+ 
+       http://www.apache.org/licenses/LICENSE-2.0
+ 
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+</copyright>
+<properties>
+<title>Reverse Proxy HowTo</title>
+<author email="rj...@apache.org">Rainer Jung</author>
+<date>$Date$</date>
+</properties>
+<body>
+<section name="Introduction"> 
+<br/>
+<p>The Apache module mod_jk and its ISAPI and NSAPI variants connect
+a web server to a backend (typically Tomcat) using the AJP protocol.
+The web server receives an HTTP(S) request and the module forwards
+the request to the back end. This situation is usually called a gateway
+or a proxy, in the context of HTTP it is called a reverse proxy.
+</p>
+</section>
+<section name="Typical problems">
+<br/>
+<p>A reverse proxy is not totally transparent to the application on
+the backend. For instance the host name and port the original client
+(e.g. browser) needs to talk to belong to the web server and not to the
+backend, so the reverse proxy talks to a different host name and port.
+When the application on the backend returns content including
+self-referencing URLs using its own backend address and port, the
+client will usually not be able to use these URLs.
+</p>
+<br/>
+<p>Another example is the client IP address, which for the web server is the
+source IP of the incoming connection, whereas for the backend the
+connection always comes from the web server. This can be a problem, when
+th client IP is checked by the backend application e.g. for security reasons.
+</p>
+</section>
+<section name="AJP as a solution">
+<br/>
+<p>Most of these problems are automatically handled by the AJP protocol
+and the AJP connectors of the backend. The AJP protocol transports
+this communication metadata and the backend connector presents this
+metadata whenever the appliaction asks for it using Servlet API methods.
+</p>
+<p>The following list contains the communication metadata handled by AJP
+and the HttpServletRequest API calls which can be used to retrieve them:
+<ul>
+<li>local name: <code>getLocalName()</code> and <code>getLocalAddr</code>.
+This is also equal to getServerName(), unless a <code>Host</code> header
+is contained in the request. in this case server name is taken from that 
header.
+</li>
+<li>local port: <code>getLocalPort()</code>
+This is also equal to getServerPort(), unless a <code>Host</code> header
+is contained in the request. in this case server port is taken from that header
+if it contains an explicit port, or is equal to the default port of the scheme 
used.
+</li>
+<li>client address: <code>getRemoteAddr()</code>
+</li>
+<li>client host: <code>getRemoteHost()</code>
+</li>
+<li>authentication type: <code>getAuthType()</code>
+</li>
+<li>remote user: <code>getRemoteUser()</code>,
+if <code>tomcatAuthentication="false"</code>
+</li>
+<li>protocol: <code>getProtocol()</code>
+</li>
+<li>HTTP method: <code>getMethod()</code>
+</li>
+<li>URI: <code>getRequestURI()</code>
+</li>
+<li>HTTPS used: <code>isSecure()</code>, <code>getScheme()</code>
+</li>
+<li>query string: <code>getQueryString()</code>
+</li>
+<li>SSL cipher: <code>getAttribute(javax.servlet.request.cipher_suite)</code>
+</li>
+<li>SSL key size: <code>getAttribute(javax.servlet.request.key_size)</code>
+</li>
+<li>SSL client certificate: 
<code>getAttribute(javax.servlet.request.X509Certificate)</code>
+</li>
+<li>SSL session ID: 
<code>getAttribute(javax.servlet.request.ssl_session)</code>.
+This is for Tomcat, it has not yet been standardized.
+</li>
+</ul>
+</p>
+</section>
+<section name="Fine tuning">
+<br/>
+<p>In some situations this is not enough though. Assume there is another
+less clever reverse proxy in front of your web server, for instance an
+HTTP load balancer or similar device which also serves as an SSL accelerator.
+</p>
+<p>Then you are sure, all your clients use HTTPS, but your web server doesn't
+know about that. All it can see is requests coming from the accelerator using
+plain HTTP.
+</p>
+<p>Another example would be a simple reverse proxy in front of your web server,
+so that the client IP address your web server detects is always the IP address
+of this reverse proxy, and not of the original client. Often such reverse 
proxies
+generate an additional HTTP header, like <code>X-Forwareded-for</code> which
+contains the original client IP address (or a list of IP addresses, if there 
are
+more cascading reverse proxies in front). It would be nice, if we could use the
+content of such a header as the client IP adrress to pass to the backend.
+</p>
+<p>So we might need to manipulate some of the data that AJP sends to the 
backend.
+When using mod_jk inside Apache httpd you can use several httpd environment
+variables to let mod_jk know, which data it should forward. These environment
+be set by the httpd directives SetEnv or SetEnvIf, but also in very flexible
+was using mod_rewrite (since httpd 2.x it can not only test against environment
+variables, but also set them).
+</p>
+<p>The following list contains all environment variables mod_jk checks, before
+sending data to the backend:
+<ul>
+<li>JK_LOCAL_NAME: the local name
+</li>
+<li>JK_LOCAL_PORT: the local port
+</li>
+<li>JK_REMOTE_HOST: the client host XXX ??
+</li>
+<li>JK_REMOTE_ADDR: the client address
+</li>
+<li>JK_AUTH_TYPE: the authentication type
+</li>
+<li>JK_REMOTE_USER: the remote user
+</li>
+<li>HTTPS: On (case-insensitive) to indicate, that HTTPS is used
+</li>
+<li>SSL_CIPHER: the SSL cipher
+</li>
+<li>SSL_CIPHER_USEKEYSIZE: the SSL key size
+</li>
+<li>SSL_CLIENT_CERT: the SSL client certificate
+</li>
+<li>SSL_CLIENT_CERT_CHAIN_: prefix of variable names, containing
+the client cerificate chain
+</li>
+<li>SSL_SESSION_ID: the SSL session ID
+</li>
+</ul>
+</p>
+<p>Remember: in general you don't need to set them. The module retrieves the 
data automatically
+from the web server. Only in case you want to change this data, you can 
overwrite it by
+using these variables.
+</p>
+<p>Some of these variables might also be used by other web server modules. All
+variables whose name does not begin with "JK" are set directly by Apache httpd.
+If you want to change the data, but do not want to negatively influence the 
behaviour
+of other modules, you can change the names of all variables mod_jk uses. For 
the
+details see the <a href="../reference/apache.html">Apache reference</a> page.
+</p>
+<p>All variables, that are not SSL-related have only been introduced in 
version 1.2.27.
+</p>
+</section>
+<section name="URL handling">
+<br/>
+<subsection name="URL rewriting">
+<p>Sometimes one want to change path components of the URLs under which an 
application
+is available. Especially if a web application is deployed as some context, say 
<code>/myapp</code>,
+marketing prefers short URLs, so want the application to be directly available 
under
+<code>http://www.mycompany.com/</code>. Although you can deploy the 
application as the so-called
+ROOT context, which will be directly available at "/", admins often prefer not 
to use
+the ROT context, e.g. because only one application can be the root context 
(per host).
+</p>
+<p>The procedure to change the URLs in the reverse proxy is tedious, because 
often
+an application produces self-referencing URLs, which then include the path 
components,
+that you tried to hide to the outside world. Nevertheless, if you absolutely 
need to do it,
+here are the steps.
+</p>
+<p>Case A: You need to make the application available at a simple URL, but it 
is OK, if
+users proceed using the more complex URLs, as long as they don't have to type 
them in.
+That's the easy case, and if this suffices to you, you're lucky. Use a simply 
RedirectMatch
+for Apache httpd:
+</p>
+<source>
+RedirectMatch ^/$ http://www.mycompany.com/myapp/
+</source>
+<p>Your application will then be available under 
<code>http://www.mycompany.com/</code>,
+and each visitor will be immediately redirected to the real URL
+<code>http://www.mycompany.com/myapp/</code>
+</p>
+<p>Case B: You need to hide path components for all requests going to the 
application.
+Here's the recipe for the case, where you want to hide the first path component
+<code>/myapp</code>. More complex manipulations are left as an exercise to the 
reader.
+First the solution for the case of Apache httpd:
+</p>
+<p>1. Use <a 
href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html";><code>mod_rewrite</code></a>
+to add <code>/myapp</code> to all requests before forwarding to the backend:
+</p>
+<source>
+# Don't forget the PT flag! (pass through)
+RewriteRule ^/(.*) http://www.mycompany.com/myapp/$1 [PT]
+</source>
+<p>2. Use <a 
href="http://httpd.apache.org/docs/2.2/mod/mod_headers.html";><code>mod_headers</code></a>
+to rewrite any HTTP redirects your application might return. Such redirects 
typically contain
+the path components you want to hide, because by the HTTP standard, redirects 
always need to include
+the full URL, and your application is not aware of the fact, that your clients 
talk to it via
+some shortened URL. An HTTP redirect is done with a special header named 
<code>Location</code>.
+We rewrite the Location headers of our responses:
+</p>
+<source>
+# Keep protocol, server and port if present,
+# but insert our webapp name before the rest of the URL
+Header edit Location ^([^/]*//[^/]*)?/(.*)$ $1/myapp/$2 
+</source>
+<p>3. Use <code>mod_headers</code> again, to rewrite the pathes contained in 
any cookies,
+your application might set. Such cookie pathes again might contain
+the path components you want to hide.
+A cookie is set with the HTTP header named <code>Set-Cookie</code>.
+We rewrite the Set-Cookie headers of our responses:
+</p>
+<source>
+# Fix the cookie path
+Header edit Set-Cookie "^(.*; Path=/)(.*)" $1/myapp/$2 
+</source>
+<p>3. Some applications might contain hard coded absolute links.
+In this case check, whether you find a configuration item for your web 
framework
+to configure the base URL. If not, your only chance is to parse all response
+content bodies and do search and replace. This is fragile and very resource 
intensive.
+If you really need to do this, you can use
+<a 
href="http://apache.webthing.com/mod_proxy_html/";><code>mod_proxy_html</code></a>,
+<a 
href="http://httpd.apache.org/docs/2.2/mod/mod_substitute.html";><code>mod_substitute</code></a>
+or <a 
href="http://blogs.sun.com/basant/entry/using_mod_sed_to_filter";><code>mod_sed</code></a>
+for this task.
+</p>
+<p>If you are using Microsoft IIS as a web server, the ISAPI plugin provides a 
way
+of doing the first step with a builtin feature. You define a mapping file for 
simple prefix
+changes like this:
+</p>
+<source>
+# Add a context prefix ...
+/=/myapp/
+# ... or change some prefix ...
+/oldapp/=/myapp/
+</source>
+<p>and then put the name of the file in the <code>rewrite_rule_file</code> 
entry of the registry or your
+<code>isapi_redirect.properties</code> file. In you 
<code>uriworkermap.properties</code> file, you
+still need to map the URLs as they are before rewriting!
+</p>
+<p>More complex rewrites can be done using the same file, but with regular 
expressions. A leading
+tilde sign '<code>~</code>', indicates, that you are using a regular 
expression:
+</p>
+<source>
+# Use a regular expression rewrite
+~/oldapps([0-9]*)/=/newapps$1/
+</source>
+<p>There is no support for Steps 2 (rewriting redirect responses) or 3. 
(rewriting cokie paths).
+</p>
+</subsection>
+<subsection name="URL encoding">
+<p>Some type of problem is triggered by the use of encoded URLs
+(see <a href="http://en.wikipedia.org/wiki/Percent-encoding";>percent 
encoding</a>).
+For the same location there exist
+a lot of different URLs which are equivalent. The reverse proxy needs to 
inspect the URL in order
+to apply its own authentication rules and to decide, to which backend it 
should send the request
+(or whether it should handle it itself). Therefore the request URL first is 
normalized:
+percent encoded characters are decoded, <code>/./</code> is replaced by 
<code>/</code>,
+<code>/XXX/../</code> is replaced by <code>/</code> and similar manipulations 
of the URL are done.
+After that, the web server might apply rewrite rules to further change the URL 
in less obvious ways.
+Finally there is no more way to put the resulting URL in an encoding, which is 
"similar" to
+the one which was used for the original URL.
+</p>
+<p>
+For historical reasons, there have been several alternatives, how mod_jk and 
the ISAPI 
+plugin encoded the resulting URL before sending it to the backend. They could 
be chosen via
+<code>JkOptions</code> (Apache httpd) or <code>uri_select</code> (ISAPI). All 
of those historical
+encodings are not recommended, because they have either negative functionality 
implications or
+pose a security risk. The default encoding since version 1.2.24 is 
<code>ForwardURIProxy</code>
+(Apache httpd) or <code>proxy</code> (ISAPI) and it is strongly recommended to 
keep the default.
+</p>
+</subsection>
+</section>
+<section name="request attributes">
+<br/>
+<p>
+You can also add more attributes to any request you are forwarding when using 
Apache httpd.
+For this use the <code>JkEnvVar</code> directive (for details see the
+<a href="../reference/apache.html">Apache reference</a> page). Such request 
attributes can be
+retrieved on the Tomcat side via request.getAttribute(attributeName).
+Note that their names will not be listed in request.getAttributeNames()!
+</p>
+</section>
+</body>
+</document>

Propchange: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: tomcat/connectors/trunk/jk/xdocs/generic_howto/proxy.xml
------------------------------------------------------------------------------
    svn:keywords = Author Date Id Revision



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to