Hi Tim,

I'm back to this one :

On Wed, Jun 12, 2019 at 10:36:51PM +0200, Tim Duesterhus wrote:
> This thread contains two "competing" patches to fix the BUG that HAProxy
> does not set the `Vary` response header when the compression filter is
> applied. When not setting the `Vary` header the response may be miscached
> by intermediate caching proxies.
> 
> Please select the one you like better, because I wasn't sure. I'll explain
> the differences below:
> 
> PATCH 1 (the one *without* v2):
> -------------------------------
> 
> This one attempts to only set the `Vary` response header when it's
> *required* to not pollute responses that are never going to be compressed
> based on the current configuration (e.g. because the Content-Type is not
> listed in `compression type`).
> 
> To do so the patch adds a new `would_compress` flag and requires careful
> checking in `htx_set_comp_reshdr`:
> 
> 1. All the response conditions must go first.
> 2. Then the `would_compress` flag must be set.
> 3. Then the other conditions (e.g. compression rate) must be checked.
> 
> Otherwise the `would_compress` flag might be missing due to a temporary
> condition, leading to a missing `Vary` header, leading to bugs.

So I'm a bit confused by what it does because it *seems* to set the Vary
header even when the client mentions no compression is supported by not
specifying an accept-encoding, asking the cache to revalidate for whatever
accept-encoding request it sees.

I think that the real bug is in fact that we can return compressed
contents that do not advertise vary and that this one alone needs to
be addressed. The remaining cases are just cache optimizations and
will only serve to encourage caches to try to find a better
representation even when an uncompressed one is present. But I'm really
not convinced it's welcome, because if compression was enabled on haproxy
in the first place, it's to save bandwidth or download time. If a cache
is present between the client and haproxy, it will always be faster to
deliver the uncompressed object than it would be to fetch the same again
from haproxy hoping to get a different representation. Also, returning
Vary for all non-matching algos may result in cache pollution : if
someone fetches through a cache a large number of same objects with
random accept-encoding, all responses will be uncompressed with a Vary
header and will result in a different copy in the cache. Without the
Vary header for uncompressed objects, all non-matching algos may use
the same single uncompressed representation.

So in my opinion we should only emit "Vary: accept-encoding" when
adding Content-Encoding. Am I missing something ?

Thanks,
Willy

Reply via email to