On 17 Dec 2013, at 10:32, Thomas Eckert wrote: > I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do > the detection magic but mod_xml2enc fails to detect compressed content > correctly. Hence a simple "ProxyHTMLEnable" fails when content compression is > in place.
Aha! Revisiting that, I see I still have an uncommitted patch to make content types to process configurable. I think that was an issue you originally raised? But compression is another issue. > To work around this without dropping support for content compression you can > do > > SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE > > or at least that was the kind-of-result of the half-finished discussion last > time. I didn't find that discussion. But I suspect my reaction would have included a certain aversion to that level of processing overhead in the proxy in these days of fatter pipes and hardware compression. > Suppose the client does > > GET /something.tar.gz HTTP/1.1 > ... > Accept-Encoding: gzip, deflate > > to which the backend will respond with 200 but *not* send an > "Content-Encoding" header since the content is already encoded. Using the > above filter chain "corrupts" the content because it will be inflated and > then deflated, double compressing it in the end. Hmmm? If the backend sends compressed contents with no content-encoding, doesn't that imply: 1. INFLATE doesn't see encoding, so steps away. 2. xml2enc and proxy-html can't parse compressed content, so step away (log an error?) 3. DEFLATE … aha, that's what you meant about double-compression. In effect the whole chain was reduced to just DEFLATE. That's a bit nonsensical but not incorrect, and the user-agent will reverse the DEFLATE and restore the original from the backend, yesno? > Imho this whole issue lies with proxy_html using xml2enc to do the content > type detection and xml2enc failing to detect the content encoding. I guess > all it really takes is to have xml2enc inspect the headers_in to see if there > is a "Content-Encoding" header and then add the inflate/deflate filters > (unless there is a general reason not to rely on the input headers, see > below). Well in this particular case, surely it lies with the backend? But is the real issue anything more than an inability to use ProxyHTMLEnable with compressed contents? In which case, wouldn't mod_proxy_html be the place to patch? Have it test/insert deflate at the same point as it inserts xml2enc? > Of course, this whole issue would disappear if inflate/deflate would be run > automagically (upon seeing a Content-Encoding header) in general. Anyway, > what's the reasoning behind not having them run always and give them the > knowledge (e.g. about the input headers) to get out of the way if necessary ? That's an interesting thought. mod_deflate will of course do exactly that if configured, so the issue seems to boil down to configuring that filter chain. The ultimate chain here would be: 1. INFLATE // unpack compressed contents 2. xml2enc // deal with charset for libxml2/mod_proxy_html 3. proxy-html // fix URLs 4. xml2enc // set an output encoding other than utf-8 5. DEFLATE // compress That's not possible with SetOutputFilter or FilterChain&family, because you can't configure both instances of xml2enc at once (that's what ProxyHTMLEnable deals with). But of those, 4 and 5 seem low-priority as they're not doing really essential things. Returning to: > SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE AFAICS the only thing that's missing is the nonessential step 4 above. Am I missing something? -- Nick Kew
