Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

Issac Goldstand Wed, 20 Sep 2006 06:33:56 -0700


Graham Leggett wrote:

Niklas Edmundsson wrote:
However, I don't see how you can do a lockless design with multiplefiles and an index that can do:
* Clients read from the cache as files are being cached.
* Only one session caches the same file.
* Header/Body updates.
* No index/files out-of-sync issues. Ever.
Thinking about this some more I do see a race during purging - a cachethread could read the header, the purge deletes header and body, andthen the cache thread reads the body, and interprets the missing bodyas "the body is still coming".
One possible (and reasonably simple) solution would be to cache theheader and body in a unique directory - the directory name becomes thekey, and the entry is either cached completely / still being cached ifthe directory exists. This assumes it's possible to atomically deletedirectories.

I don't understand why bother getting so complex. Touch/truncate thebody file when storing the header, and then a missing body means thingshave gone amok - retry the request. Conversely, a zero-length, or < C-Lbody length means another thread is working on the body.

Another option is to version the filename of the body based on a keyin the header. In other words, in the header, called <key>.header, isa version number <timestamp>, meaning there should be a body called<key>.<timestamp>.body. A replacement cached entry therefore cannotstomp on what pre existing threads are doing. If the body file iscreated first, before the header file, then a non existent body filemeans "this entry has been invalidated, try the request again".
There is an assumption that <timestamp> is fine grained enough to beunique.
You're right, this is a tricky one, but there is a solution out there.

Maybe we're attacking the problem from the wrong angle. Rather thanmodifying mod_cache, modify the garbage-collector (e.g., htcacheclean).Do a two pass cleanup. The first pass is a data-store transversal passwhich decides what to remove. It immediately purges the header file,and stores the entity key (or filename, or whatever it needs tore-access the entity) in a list. Once the first pass finishes, a secondpass is made leisurely cleaning up all of the entities that are stillmissing their header files (that way, if a mod_cache thread re-cachesthe entity, we won't purge it).

That should be a safe solution, provided that the time taken to performthe first pass is shorter than the time between opening the header andbody files. That should normally be the case, unless someone can comeup with a reasonable case where it wouldn't be so?


 Issac

Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities

Reply via email to