Re: [PATCH] BUG/MEDIUM: cache: Fix hash collision in `accept-encoding` handling for `Vary`

Willy Tarreau Thu, 31 Dec 2020 00:48:45 -0800

Hi Rémi,

(sorry for the lag, still on vacation, I don't look at haproxy every day :-))

On Tue, Dec 29, 2020 at 10:31:58AM +0100, Remi Tricot-Le Breton wrote:
> > I remain convinced that now that we have the bitmap, we're probably
> > deploying too many efforts to allow caching of unknown encodings with
> > all the consequences it implies, and that just like in the past we would
> > simply not cache when Vary was presented, not caching when an unknown
> > encoding is presented would seem perfectly acceptable to me, especially
> > given the already wide list of known encodings. And when I see the test
> > using "Accept-Encoding: br,gzip,xxx,jdcqiab", this comforts me in this
> > idea, because my feeling here is that the client supports two known
> > encodings and other ones, if we find an object in the cache using no
> > more than these encodings, it can be used, otherwise the object must
> > be fetched from the server, and the response should only be cached if
> > it uses known encodings. And if the server uses "jdcqiab" as one of
> > the encodings I don't care if it results in the object not being cached.
> 
> It would not be that hard to discard responses that have an unknown encoding
> but for now the content-encoding of the response is only parsed when the
> response has a vary header (it could actually have been limited to a
> response that varies on accept-encoding specifically). And considering that
> we probably don't want to parse the content-encoding all the time, responses
> with an unknown encoding would only be considered uncacheable when they have
> a vary, and would be cached when they don't. I'm totally fine with it if you
> are too, it would only require a few lines of explanation in the doc.

Well, if a content-encoding is specified in the response and there's no
vary header, it's the server's problem. I mean, we could be kind and helpful
and purposely decide not to cache in this case, but it remains the server's
responsibility to advertise accept-encoding in the vary header if an encoding
is used. And the server could very well decide that it knows its clients
always support the delivered encoding and purposely decide not to emit a
vary header for example.

Thus I'd suggest that we continue to inspect the content-encoding only
when vary is present, but just decide that when we find a token we don't
know, we simply refrain from caching. Based on the exhaustive list you've
already implemented, this will really affect no real-world use case at
the moment, and will allow us to totally get rid of the hash and the
collisions that come with it.

And as Tim mentionned, the tests you've implemented are easy to extend
to support new encodings in the future, so I think it's worth doing it
this way. But I'm also fine with alternate proposals, it's just that I
suspect that whatever other solution might become more complex for no
real-world benefit.

Thanks,
Willy

Re: [PATCH] BUG/MEDIUM: cache: Fix hash collision in `accept-encoding` handling for `Vary`

Reply via email to