Hi,

We've been working on a lingering HTTP-compliance issue in mod_pagespeed:
respecting Vary:User-Agent.  mod_pagespeed needs to cache resources in order
to optimize them.  The economics of this make sense when the server
optimizes a resource, and saves the optimization for serving to multiple
clients.

The problem is that this is, in general, expensive to do correctly when the
site owner has put Vary:User-Agent in the response header for, say, a css or
javascript file.  There are legitimate reasons to do this, such as serving a
different version of a CSS file to IE6.  But I think most sites don't do
that.  However, there is a disturbing passage in the document for
mod_deflate: http://httpd.apache.org/docs/2.2/mod/mod_deflate.html:

SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

This encourages all site owners to add Vary:User-Agent to all css and js
files, whether they actually vary in content or not.

Does anyone know the history of this recommendation?  Surely that is an
inappropriate recommendation for mod_deflate.  Vary:Accept-Encoding make
sense in the context of that filter, but not Vary:User-Agent.

The problem with defensively setting Vary:User-Agent is that any proxy cache
-- and in this respect mod_pagespeed acts like a proxy cache -- must fetch
origin content for each distinct user-agent.  While it's feasible for us to
employ a level of indirection in our cache so that we only store extra
copies of a resource when it actually differs -- this can, I fear, be
catestrophic to the cache working-set and hit-rate.  We couldn't get around
storing each distinct user-agent.


So, there are two questions:

1. Who can I lobby to get the recommendation changed for the mod_deflate
doc?  That recommendation seems incorrect &/or obsolete.
2. Given that there are likely a huge number of sites that blindly followed
that recommendation, is there a straightforward way for mod_pagespeed to
correct the situation?  Specifically, can mod_pagespeed get access to apache
configuration parameters that were added by other filters, looking
specifically for the pattern quoted above with SetEnvIfNoCace & Header?  Is
it OK for mod_pagespeed to register for config-parameters owned by another
module?

I think what we'd do is basically let mod_pagespeed ignore "Vary:User-Agent"
if we saw that it was inserted per this exact pattern.  This would, to be
pendantic, violate  HTTP, but I think it would help make the web faster, and
in practice would help many more sites than it would hurt.  Sites that
specifically added vary:user-agent using a more specific construct, such as
identifying a particular CSS file that they want to Vary, would be treated
differently.

-Josh

Reply via email to