Re: Cache miss stampedes

dormando Thu, 26 Jul 2007 01:31:10 -0700

Well. Uh.

Sometimes ya just can't, yaknow? :)

In an ideal world your cache never expires, memcached never flaps, andyou have tools updating caches in the background that work flawlessly.Unfortunately developers don't always have the time to make theseperfect, but it's "easy" to patch in one of the previous suggestions todeal with the problem.


Lets say you're a typical startup and you're faced with a problem:

You have a complex bit of parsing code for special data. There was neverany code written to automatically, or programmatically, update thisdata. Sometime in the future caching is added. This is easy; cache theresult of the parsing request into memcached, let it expire once perminute so it's easy to propagate changes (which are made by hand,rarely). It's wrong, but it's what happened.

It might take someone a nontrivial amount of time to fix this. Writecode to serialize the data into the DB, write a CLI tool or webpage tomanage the data, so the cache can be updated after the data is edited(either by hand, or whatever). Good luck getting your boss to sign offon that. There're magic widgits that aren't writing themselves!

I do realize there's an easy way to update the data by hand, then run atool to just refresh the cache, but that's besides the point ;) Imagineagain that you have a lot of these situations. Where caching was pluggedin as an afterthought. For new development I _always_ recommended a cronto update the data (getting PHP devs to write crons is like pullingteeth!), or a tool that updates the cache. It doesn't always happen.

At Gaia it's also common for this to happen where simple 'query caching'was plugged in as a caching methodology. Everywhere there's SQL that'srelatively static, adding an ->enableCache(blah) call makes it faster!Right? Right... Turns out you can also plug in one of the aforementionedalgorithms to mitigate this brain damage.

Also, if you have a cluster configured to not auto-rehash, andmemcached's can stay down for multiple minutes during a failure, youwill get a similar stampeding problem anyway. You should just cache thedata in APC at this point anyway.

Well. In summary; you're right. On that note, I realize in all the wikiupdatery I hadn't really stressed what you mentioned enough at all. It'sthere, but it's not a highlight.


-Dormando

Steven Grimm wrote:

I admit I'm a bit baffled by this discussion (and I also admit I haveonly been skimming it, so this might be a retread.) It seems like one oftwo situations should be true:
1. The underlying data has not changed. The cache is therefore stillcorrect.
2. The underlying data has changed, and the cache is now stale.
In the first case, just don't set an expiration time and you're done,yes? Since the item is frequently hit (hence the stampedes) it willnever get LRUed out.
In the second case, why are you waiting around for some unknown amountof time to pass -- and for some client to get an actual cache miss --before refreshing the cache? If you have a few hot keys that changeoften but for whatever reason you can't invalidate / update the cache atthe time the underlying data gets updated, then another approach is tohave some background task periodically updating the hot items to theircurrent values. Again, you don't let the item expire in this scenario;it just gets updated every once in a while. This way nobody has to dealwith a cache miss, and the values still stay as current as you want themto (adjust the frequency of your background task's updates to taste)with no stampedes.
What am I missing?

-Steve

Re: Cache miss stampedes

Reply via email to