Hello Willy, Tim,

On 29/12/2020 07:52, Willy Tarreau wrote:
Hi Tim,

On Mon, Dec 28, 2020 at 12:47:52PM +0100, Tim Duesterhus wrote:
This patch fixes GitHub Issue #988. Commit 
ce9e7b25217c46db1ac636b2c885a05bf91ae57e
was not sufficient, because it fell back to a hash comparison if the bitmap
of known encodings was not acceptable instead of directly returning the the
cached response is not compatible.

This patch also extends the reg-test to test the hash collision that was
mentioned in #988.

Vary handling is 2.4, no backport needed.
Thanks for this, but while your patch looks valid, the regtest fails
for me:

**** c1    http[ 0] |HTTP/1.1
**** c1    http[ 1] |200
**** c1    http[ 2] |OK
**** c1    http[ 3] |content-type: text/plain
**** c1    http[ 4] |vary: accept-encoding
**** c1    http[ 5] |cache-control: max-age=5
**** c1    http[ 6] |content-length: 48
**** c1    http[ 7] |age: 0
**** c1    http[ 8] |x-cache-hit: 1
**** c1    c-l|!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNO
**** c1    bodylen = 48
**   c1    === expect resp.status == 200
**** c1    EXPECT resp.status (200) == "200" match
**   c1    === expect resp.bodylen == 45
---- c1    EXPECT resp.bodylen (48) == "45" failed

I don't know if it's the code or the test, but we're having an issue here
it seems. Note that this response has no content-encoding so there might
be something caused by this.

The specific accept-encoding test works but the regular vary.vtc does fail. But it might need a test change rather than a code change because it exhibits a use case we might not want to manage : If a server responds with a known encoding to a request that only has unknown encodings, we won't be able to answer directly from the cache, even if the accept-encoding header values match. This is caused by the "return (ref.encoding_bitmap & new.encoding_bitmap) != ref.encoding_bitmap" line in the accept_encoding_hash_cmp function. This is rather easy to do from a vtc but I don't know if this has a very high chance of happening on real networks. And considering that the alternative is a possible cache corruption because of hash collisions, we could probably add it as a known limitation.

Concerning the vary.vtc, the new behavior can be explained as follows :
Because of the absence of content-encoding in the second response of the vtc, its encoding is set to "identity", which is accepted by default by any client not rejecting it explicitely. So when looking into the cache for a response that might suit a request having an "Accept-Encoding: first_value" header, this specific one matches.

On a related note, I think we're still having an issue that was revealed
by your discovery of the bug and the fix. Indeed, initially there was
only the hash for the list of values. The goal was to simply have an
easy to compare value for a set of known encodings. Now for the known
ones we have a bitmap. Thus we shouldn't let the known values contribute
to the hash anymore.

I remain convinced that now that we have the bitmap, we're probably
deploying too many efforts to allow caching of unknown encodings with
all the consequences it implies, and that just like in the past we would
simply not cache when Vary was presented, not caching when an unknown
encoding is presented would seem perfectly acceptable to me, especially
given the already wide list of known encodings. And when I see the test
using "Accept-Encoding: br,gzip,xxx,jdcqiab", this comforts me in this
idea, because my feeling here is that the client supports two known
encodings and other ones, if we find an object in the cache using no
more than these encodings, it can be used, otherwise the object must
be fetched from the server, and the response should only be cached if
it uses known encodings. And if the server uses "jdcqiab" as one of
the encodings I don't care if it results in the object not being cached.

It would not be that hard to discard responses that have an unknown encoding but for now the content-encoding of the response is only parsed when the response has a vary header (it could actually have been limited to a response that varies on accept-encoding specifically). And considering that we probably don't want to parse the content-encoding all the time, responses with an unknown encoding would only be considered uncacheable when they have a vary, and would be cached when they don't. I'm totally fine with it if you are too, it would only require a few lines of explanation in the doc.

Just my two cents,
Willy


Rémi

Reply via email to