On 05/05/13 16:32, Maxim Dounin wrote:
Hello!

On Sat, May 04, 2013 at 07:08:55PM -0400, Jim Ohlstein wrote:

[...]

I have just seen a similar situation using fastcgi cache. In my case
I am using the same cache (but only one cache) for several
server/location blocks. The system is a fairly basic nginx set up
with four upstream fastcgi servers and ip hash. The returned content
is cached locally by nginx. The cache is rather large but I wouldn't
think this would be the cause.

[...]

     fastcgi_cache_path /var/nginx/fcgi_cache levels=1:2
keys_zone=one:512m max_size=250g inactive=24h;

[...]

The other sever/location blocks are pretty much identical insofar as
fastcgi and cache are concerned.

When I upgraded nginx using the "on the fly" binary upgrade method,
I saw almost 400,000 lines in the error log that looked like this:

2013/05/04 17:54:25 [crit] 65304#0: unlink()
"/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7" failed
(2: No such file or directory)

[...]

After binary upgrade there are two cache zones - one in old nginx,
and another one in new nginx (much like in originally posted
configuration).  This may cause such errors if e.g. a cache file
is removed by old nginx, and new nginx fails to remove the file
shortly after.

The 400k lines is a bit too many though.  You may want to check
that the cache wasn't just removed by some (package?) script
during the upgrade process.  Alternatively, it might indicate that
you let old and new processes to coexist for a long time.

I hadn't considered that there are two zones during that short time. Thanks for pointing that out.

To my knowledge, there are no scripts or packages which remove files from the cache, or the entire cache. A couple of minutes after this occurred there were a bit under 1.4 million items in the cache and it was "full" at 250 GB. I did look in a few sub-directories at the time, and most of the items were time stamped from before this started so clearly the entire cache was not removed. During the time period these entries were made in the error log, and in the two minutes after, access log entries show the expected ratio of "HIT" and "MISS" entries which further supports your point below that these are harmless (although I don't really believe that I have a cause).

I'm not sure what you mean by a "long time" but all of these entries are time stamped over over roughly two and a half minutes.


On the other hand, as discussed many times - such errors are more
or less harmless as soon as it's clear what caused cache files to
be removed.  At worst they indicate that information in a cache
zone isn't correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone.  It probably
should be logged at [error] or even [warn] level instead.


Why would max_size not be maintained properly? Isn't that the responsibility cache manager process? Are there known issues/bugs?

Thank you for your response and assistance.


--
Jim Ohlstein

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Reply via email to