There are at least 2 bug reports about the behaviour of mod_rewrite on unescaping URLs and then passing the unescaped references in the rewrite target (bug 34602 and 32328 deal about this in httpd-2 and I recall some other bug report for 1.3). As far as I have found out the problem really is that the httpd unescapes the URLs before passing it to the mapper modules.
E.g. the rule
RewriteRule ^/(.*)$ /index.php?title=$1 [L]
rewrites URLs like /Foo to /index.php?title=Foo but as soon as there is a special escaped char in the URL, like an escaped hash, plus or ampersand the result is not as intended: /Foo%2BBar (/Foo+Bar urlencoded) is rewritten to /index.php?title=Foo Bar (instead of /index.php?title=Foo+Bar) and even worse /Foo%23Bar (/Foo#Bar urlencoded) is rewritten to /index.php?title=Foo#Bar (instead of /index.php?title=Foo%23Bar) so that the parts after the hash get totally ignored.

I know that there are workarounds to this problem by using the untouched %{THE_REQUEST} variable in a rewrite condition or inspecting these in the script that gets executed (like wikimedia does) but these are suboptimal. I have written a patch that tries to address this problem. To remain backwards-compatible I did not change the original rewrite-behaviour but instead added a new flag to indicate that backreferences should get escaped. Adding the flag [B] or [backrefescaping] to a RewriteRule makes mod_rewrite escape the backreferences in the rewrite target, e.g.
RewriteRule ^/(.*)$ /index.php?title=$1 [L,B]
Forces that when constructing the rewrite target the backreferenced parts get re-encoded.

The patch can be found here: http://issues.apache.org/bugzilla/attachment.cgi?id=20217 Note that it is against 2.2.4 because I couldn't get the SVN version to work.

Here is the patch for the doc (against SVN HEAD):
--- httpd/docs/manual/mod/mod_rewrite.xml.orig 2007-05-18 19:28:17.796875000 +0200 +++ httpd/docs/manual/mod/mod_rewrite.xml 2007-05-18 19:18:25.078125000 +0200
@@ -1176,6 +1176,19 @@
       following flags: </p>

       <ul>
+               <li>'<strong><code>backrefescaping|B</code></strong>'
+               Escapes the backreferences in the substitution string for
+               use as query string arguments.
+<example>
+RewriteRule ^(.*)$     index.php?show=$1       [B,L]
+</example>
+               If you do not use this flag, escaping of the URL will be done
+               before the backreference is placed. This will not work if the 
initial
+               URL contains any special characters that need escaping.
+               In the given example, loading the URL http://example.com/C++ 
would
+               do an internal redirect to index.php?show=C%2B%2B instead of
+               index.php?show=C++ (which would possibly not give the result 
intended).
+               </li>
         <li>'<strong><code>chain|C</code></strong>'
         (<strong>c</strong>hained with next rule)<br />
          This flag chains the current rule with the next rule

--
Günther

Reply via email to