Hi! (“Sorry for the long delay” is officially my motto at this point.)
Christopher Baines <[email protected]> skribis: > This has been on my mind for a while, as I wonder what effect it has on > users fetching substitues. > > The narinfo caching as I understand it works as follows: > > Default success TTL => 36 hours > Negative TTL => 1 hour > Transient error TTL => 10 minutes > > I'm ignoring the success TTL, I'm just interested in the negative and > transient error values. Negative means that when a server says it > doesn't have an output, that response will be cached for an > hour. Transient errors are for other HTTP response codes, like 504. You’re looking at the default TTLs, which are not the actual TTLs. Specifically, servers can include a ‘Cache-Control’ header in their reply specifying the TTL of their choice, and ‘guix substitute’ honors that: https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200 https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371 ‘guix publish’ returns 404 with a TTL of 5mn when the requested item is in store but needs to be “baked”. However, ‘guix publish’ does not set ‘Cache-Control’ when the request item is not in store. In that case, clients use ‘%narinfo-negative-ttl’ (1h). > I had a look through the Git history, caching negative lookups has been > a thing for a while. Caching transient errors was added, but I couldn't > see why. Transient error caching was most likely added in the days of hydra.gnu.org, that VM that was extremely slow. When overloaded, you’d get 500 or similar, and at that point it was safer for clients to wait and come back later, possibly much later. :-) > Personally I don't see a reason to keep either behaviours? The main arguments for these negative TTLs are: 1. Reducing server load: if the server doesn’t have libreoffice, don’t come back asking every 10s, it’s prolly useless. You could easily have “GET storms” for libreoffice if clients don’t restrain themselves. 2. Improving client performance: don’t GET things that are likely to fail. Now, the penalty it imposes is annoying. I’ve sometimes found myself working around it, too (because I knew the server was going to have the store item sooner than 1h). Rather than removing it entirely, I can think of these options: 1. Reduce the default negative timeouts. 2. Add an option to ‘guix publish’ (and to the Coordinator?) so they send a ‘Cache-Control’ header with the chosen TTL on 404. That way, if the server operator doesn’t mind extra load, they can run “guix publish --negative-ttl=0”. WDYT? Does that make any sense? Ludo’.
