Hello! On Mon, May 06, 2013 at 09:01:45AM -0400, Jim Ohlstein wrote:
> On 05/05/13 16:32, Maxim Dounin wrote: > >Hello! > > > >On Sat, May 04, 2013 at 07:08:55PM -0400, Jim Ohlstein wrote: > > > >[...] > > > >>I have just seen a similar situation using fastcgi cache. In my case > >>I am using the same cache (but only one cache) for several > >>server/location blocks. The system is a fairly basic nginx set up > >>with four upstream fastcgi servers and ip hash. The returned content > >>is cached locally by nginx. The cache is rather large but I wouldn't > >>think this would be the cause. > > > >[...] > > > >> fastcgi_cache_path /var/nginx/fcgi_cache levels=1:2 > >>keys_zone=one:512m max_size=250g inactive=24h; > > > >[...] > > > >>The other sever/location blocks are pretty much identical insofar as > >>fastcgi and cache are concerned. > >> > >>When I upgraded nginx using the "on the fly" binary upgrade method, > >>I saw almost 400,000 lines in the error log that looked like this: > >> > >>2013/05/04 17:54:25 [crit] 65304#0: unlink() > >>"/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7" failed > >>(2: No such file or directory) > > > >[...] > > > >After binary upgrade there are two cache zones - one in old nginx, > >and another one in new nginx (much like in originally posted > >configuration). This may cause such errors if e.g. a cache file > >is removed by old nginx, and new nginx fails to remove the file > >shortly after. > > > >The 400k lines is a bit too many though. You may want to check > >that the cache wasn't just removed by some (package?) script > >during the upgrade process. Alternatively, it might indicate that > >you let old and new processes to coexist for a long time. > > I hadn't considered that there are two zones during that short time. > Thanks for pointing that out. > > To my knowledge, there are no scripts or packages which remove files > from the cache, or the entire cache. A couple of minutes after this > occurred there were a bit under 1.4 million items in the cache and > it was "full" at 250 GB. I did look in a few sub-directories at the > time, and most of the items were time stamped from before this > started so clearly the entire cache was not removed. During the time > period these entries were made in the error log, and in the two > minutes after, access log entries show the expected ratio of "HIT" > and "MISS" entries which further supports your point below that > these are harmless (although I don't really believe that I have a > cause). > > I'm not sure what you mean by a "long time" but all of these entries > are time stamped over over roughly two and a half minutes. Is it ok in your setup that 400k cache items are removed/expired from cache in two minutes? If yes, then it's probably ok. > >On the other hand, as discussed many times - such errors are more > >or less harmless as soon as it's clear what caused cache files to > >be removed. At worst they indicate that information in a cache > >zone isn't correct and max_size might not be maintained properly, > >and eventually nginx will self-heal the cache zone. It probably > >should be logged at [error] or even [warn] level instead. > > > > Why would max_size not be maintained properly? Isn't that the > responsibility cache manager process? Are there known issues/bugs? Cache manager process uses the same shared memory zone to maintain max_size. And if nginx thinks a cache file is here, but the file was in fact already deleted (this is why alerts in question appear) - total size of the cache as recorded in the shared memory will be incorrect. As a result cache manager will delete some extra files to keep (incorrect) size under max_size. In a worst case cache size will be again correct after inactive= time passes after cache files were deleted. -- Maxim Dounin http://nginx.org/en/donation.html _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx