Hi all,
I'm currently facing an issue where the directive ProxyHTMLURLMap does
not work. And I am not sure whether that is by design or not, and where
I would appreciate some feedback.
Let's assume an imaginary backend server delivers a HTML page that
contains a link like this:
<a href="http://internal/!%22%23$/">A link with special characters</a>
Please note that %22 is the double quote that needs to be encoded to not
break the HTML, and %23 is the '#' character, which we don't want to get
treated as anchor in this case. So, the unencoded URL would look like this:
http://internal/!"#$/
Now, Apache configured as reverse proxy should rewrite this link to
http://external/!"#$/ (or http://external/!%22%23$/), but not any other
links outside the sub directory /!"#$/ (nor /!%22%23$/). An imaginary
configuration to achieve that and to showcase the issue I am trying to
get feedback on looks like this:
ProxyHTMLURLMap "http://internal/!\"#$/" "http://external/!\"#$/"
Please note that the double quote is only escaped here with a backslash
to cater for the Apache configuration syntax requirements. This does not
work, i.e. the URL in the HTML document doesn't get rewritten.
Let's try to better understand what exactly is happening here. Looking
into the code of mod_proxy_html.c (trunk, SVN rev. 1832252), this is
where the string comparison happens:
524 s_from = strlen(m->from.c);
525 if (!strncasecmp(ctx->buf, m->from.c, s_from)) {
... ... do the string replacement ...
... where ctx->buf is the URL found in the HTML document, and m->from.c
is the first configured argument of ProxyHTMLURLMap. So, if the latter
is a prefix of the first, this condition should be true and the string
replacement should happen. When the expected string replacement doesn't
happen, the condition is false and the values of the variables are:
ctx->buf = http://internal/!%22%23$/
m->from.c = http://internal/!"#$/
So, the strings don't match and are not replaced for that reason.
Going forward I am not interested in finding a work around for this, but
more how to approach a fix (if this is a bug at all).
Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as
well?
Let's assume this needs to be fixed. To make the strings match, we could
either URL escape the value from the Apache directive ProxyHTMLURLMap,
or URL temporarily URL-decode the string found in the HTML document just
for the purpose of the string comparison. What is the right thing to do?
If you have managed read all this down to this line, I am curious about
your feedback. :)
Regards,
Micha