I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do
the detection magic but mod_xml2enc fails to detect compressed content
correctly. Hence a simple "ProxyHTMLEnable" fails when content compression
is in place.

To work around this without dropping support for content compression you
can do

  SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE

or at least that was the kind-of-result of the half-finished discussion
last time.

Aside from being plain ugly and troublesome to use (e.g. if you want to use
AddOutputfilter somewhere else) the above also has a major shortcoming,
which lies with already-compressed content.

Suppose the client does

  GET /something.tar.gz HTTP/1.1
  ...
  Accept-Encoding: gzip, deflate

to which the backend will respond with 200 but *not* send an
"Content-Encoding" header since the content is already encoded. Using the
above filter chain "corrupts" the content because it will be inflated and
then deflated, double compressing it in the end.

Imho this whole issue lies with proxy_html using xml2enc to do the content
type detection and xml2enc failing to detect the content encoding. I guess
all it really takes is to have xml2enc inspect the headers_in to see if
there is a "Content-Encoding" header and then add the inflate/deflate
filters (unless there is a general reason not to rely on the input headers,
see below).

Adding the inflate/deflate filters inside xml2enc is where I need some
advice. For the deflate part I can probably do something like

    const char *compression_method = apr_table_get(f->r->headers_in,
                                                       "Content-Encoding");
    if (compression_method != NULL &&
        strncasecmp(compression_method, "gzip", 4) == 0) {
            ap_add_output_filter("deflate", NULL, r, NULL);
    }

but what about the inflate part ? I can't simply add the inflate input
filter because at that point (in mod_xml2enc's xml2enc_ffunc() ) I would
then need to "go back" in the input filter chain which is afaik not
possible. So I would have to run the inflate input filter "in place".

Of course, this whole issue would disappear if inflate/deflate would be run
automagically (upon seeing a Content-Encoding header) in general. Anyway,
what's the reasoning behind not having them run always and give them the
knowledge (e.g. about the input headers) to get out of the way if necessary
?

Reply via email to