> I'm interested to know what the opinions are of those on this list with > regards to caching objects during database write operations. I've encountered > different views and I'm not really sure what the best approach is.
I described some of my views on this in the article on the eToys design, which is archived at perl.com. > Take a typical caching scenario: Data/objects are locally stored upon loading > from a database to improve performance for subsequent requests. But when > those objects change, what's the best method for refreshing the cache? There > are two possible approaches (maybe more?): > > 1) The old cache entry is overwritten with the new. > 2) The old cache entry is expired, thus forcing a database hit (and > subsequent cache load) on the next request. > > The first approach would tend to yield better performance. However there's no > guarantee the data will ever be read. The cache could end up with a large > amount of data that's never referenced. The second approach would probably > allow for a smaller cache by ensuring that data is only cached on reads. There are actually thousands of variations on caching. In this case you seem to be asking about one specific aspect: what to cache. Another important question is how to ensure cache consistency. The approach you choose depends on frequency of updates, single server vs. cluster, etc. There's a simple answer for what to cache: as much as you can, until you hit some kind of limit or performance is good enough. Sooner or later you will hit the point where the tradeoff in storage or in time spent ensuring cache consistency will force you to limit your cache. People usually use something like a dbm or Cache::Cache to implement mod_perl caches, since then you get to share the cache between processes. Storing the cache on disk means your storage is nearly unlimited, so we'll ignore that aspect for now. There's a lot of academic research about deciding what to cache in web proxy servers based on a limited amount of space which you can look at if you have space limitations. Lots of stuff on LRU, LFU, and other popular cache expiration algorithms. The limit you are more likely to hit is that it will start to take too long to populate the cache with everything. Here's an example from eToys: We used to generate most of the site as static files by grinding through all the products in the database and running the data through a templating system. This is a form of caching, and it gave great performance. One day we had to add a large number of products that more than doubled the size of our database. The time to generate all of them became prohibitive in that our content editors wanted updates to happen within a certain number of hours but it was taking longer than that number of hours to generate all the static files. To fix this, we moved to not generating anything until it was requested. We would fetch the data the first time it was asked for, and then cache it for future requests. (I think this corresponds to your option 2.) Of course then you have to decide on a cache consistency approach for keeping that data fresh. We used a simple TTL approach because it was fast and easy to implement ("good enough"). This is just scratching the surface of caching. If you want to learn more, I would suggest some introductory reading. You can find lots of general ideas about caching by searching Google for things like "cache consistency." There are also a couple of good articles on the subject that I've read recently. Randal has an article that shows an implementation of what I usually call "lazy reloading": http://www.linux-mag.com/2001-01/perl_01.html There's one about cache consistency on O'Reilly's onjava.com, but all the examples are in Java: http://www.onjava.com/pub/a/onjava/2002/01/09/dataexp1.html Also, in reference to Rob Nagler's post, it's obviously better to be in a position where you don't need to cache to improve performance. Caching adds a lot of complexity and causes problems that are hard to explain to non-technical people. However, for many of us caching is a necessity for decent performance. - Perrin