Could be a vary issue, can you post the output of `varnishlog -d -g request -q 'RespStaus eq 500'? Please anonymize the bits you don't want us to see by replacing them with "XXXXXX", but don't simply remove them.
In the meantime, here's a cheat sheet to get started with varnishlog: https://docs.varnish-software.com/tutorials/vsl-query/ -- Guillaume Quintard On Thu, Dec 16, 2021 at 12:15 PM Batanun B <bata...@hotmail.com> wrote: > Hi, > > One of our websites usually has quite a low but steady stream of visitors, > but occationally we get a sudden surge of requests over a very short time > period (about 1-2 minutes). Varnish seems to handle the load fine, but the > backends struggle with this. But I have noticed that Varnish doesn't serve > the stale cached data, but instead shows our synthetic 500 page. This is > true even for the start page, that definitely existed in the cache. And we > have a grace period of 10 days, so I it's quite annoying that we can't > simply serve the stale cached data during this short period. > > I have tried picturing the entire flow, following the logic in the vcl, > but I can't see what we do wrong. And annoyingly I can't reproduce the > problem locally by simply shutting down the backends (or setting them to > unhealthy), because whenever I do that I get the stale content served just > as intended. Could the sheer volume itself cause this, making it impossible > to reproduce by simply fetching the page a few times in the browser before > and after disabling the backends? Or is there some edge case that I haven't > thought of that is causing this? > > A simpliedfied version of our vcl is included below, with only the > relevant parts. But unless I have some blatent problem with the vcl, I > think it would be good if I learned how to troubleshoot this using the > Varnish tools, like varnishlog. So that next time this happens, I can use > varnishlog etc to see what's happening. > > Is it possible using varnishlog to find the very first request for a > specific path ("/" in our case) where it returned the synthetic 500 and put > it in the cache? And is it also possible to find the request just before > that one, for the same path. If I could extract those two requests > (including all the metadata in those transactions) from the djungle of > thousands of requests, then maybe I can find some explanation why the > second request doesn't return the stale data. > > ----------------------------- > > sub vcl_hit { > if (obj.ttl > 0s) { > // Regular cache hit > return (deliver); > } else if (req.restarts == 0 && std.healthy(req.backend_hint)) { > // Graced cache hit, first attempt. > // Force cache miss to trigger fetch in foreground (ie synchronous > fetch). > set req.http.X-forced-miss = "true"; > return (miss); > } else { > // Graced cache hit, previous attempts failed (or backend > unhealthy). Let the fetch happen in the background (ie asynchronous fetch), > and return the cached value. > return (deliver); > } > } > > sub vcl_recv { > if (req.http.X-cache-pass == "true") { > return(pass); > } > > set req.grace = 240h; // 10 day grace > } > > sub vcl_backend_response { > if (bereq.http.X-cache-pass != "true") { > if (beresp.status < 400) { > set beresp.grace = 240h; > set beresp.ttl = 30m; // Cache invalidation in the form of xkey > softpurge can put objects into grace before the TTL is past. > } else if (beresp.status < 500) { > set beresp.ttl = 1m; > return (deliver); > } else if (beresp.status >= 500) { > // In some cases we want to abandon the backend request on > 500-errors, since it otherwise would overwrite the cached object that still > is usefull for grace. > // This will make it jump to vcl_synth with a 503 status. > There it will restart the request. > if (bereq.is_bgfetch) { > // In grace period. Abandoning 5xx request, since it > otherwise would overwrite the cached object that still is usefull for grace > return (abandon); > } else if (bereq.http.X-forced-miss == "true") { > return (abandon); > } > // Non background fetch, ie no grace period (and no stale > content available). Cache the error page for a few seconds. > set beresp.ttl = 15s; > return (deliver); > } > } > } > > sub vcl_synth { > if (req.http.X-forced-miss == "true" && resp.status >= 500) { > return (restart); > } > } > > sub vcl_backend_error { > if (bereq.is_bgfetch || bereq.http.X-forced-miss == "true") { > // By calling abandon, it will jump to vcl_synth with a 503 > status. There it will restart the request. > return (abandon); > } > set beresp.http.Content-Type = "text/html; charset=utf-8"; > set beresp.ttl = 15s; > synthetic(std.fileread("/etc/varnish/500.html")); > return (deliver); > } > > ----------------------------- > _______________________________________________ > varnish-misc mailing list > varnish-misc@varnish-cache.org > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >
_______________________________________________ varnish-misc mailing list varnish-misc@varnish-cache.org https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc