Well, this all depends on the cost of creating the data and how volatile it is. If it doesn't cost much to create, there's little reason to cache it. If its expensive, then there's good reason. If the expensive parts are non-volatile, then there's a good reason to split it up. If its all fairly static and all aspects needed, then you can get away with a single entry. (A problem with a highly split data sets is determining the keys needed to do the multi-get, which can be resolved by also caching the key sets).
For serialization, you could maintain the SUID to resolve conflicts if you want to share data between versions. If you the cache is large enough, you could just prefix the build # to your keys as Brian mentioned. The problem with that approach is if you have a central server that will rewarm the remote cache on major change events and then send a refresh message so your application servers reload their caches (thus, only the notifier hits the database). In that case, you'll have a lot of misses unless you transition to a distributed rewarming approach (e.g. distributed locks per cache+build#, so first to lock rewarms, while the rest wait and then only refresh). We're probably one of the few places that have that type of scenario, since application behavior changes dramatically based on company policies and no one system may have the entire code base but rather run fine-grained services. You should probably first look at how well your local caching is doing (if you have any), and then build out the remote layer as needed. Often, the application code is just dumb and when properly written with local caches, you get a sizable performance boost. Unless a special case, its usually best to not rely on the remote cache for application performance but rather to resolve database performance issues (e.g. high CPU utilization). ----- Original Message ---- From: marc2112 <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, December 13, 2007 8:43:33 AM Subject: coarse-grained or fine-grained? Hi All, Working on designing a caching layer for my website and I wanted to get some opinions from memcached users. There are two issues I'm hashing through: 1) Level of granularity to cache data at 2) Version compatibility across software releases The primary applications that would be using the cache are developed in Java and utilize a smalish (~20 classes) domain object model. In a few use-cases as you could imagine, we only need a few attributes from 2 or 3 different domain objects to service a request. How granular is the data that folks are typically putting into memcached? Since there is support for batched gets, it would seem like one option at the farthest end of the spectrum would be to cache each attribute separately. I could see there being a lot of overhead on puts in this case and it's probably not so efficient overall. The other end of the spectrum would be to cache one object that references all of the other related data, often reading more data then we need to from the cache. The last consideration I'm thinking through in all of this is how to manage serializable class versioning. Do ppl generally take an optimistic approach here and if there is a serialization exception on read, just replace what's in the cache? Or do you include a class version indicator as part of the key? If it's part of the key, how do you make sure that there aren't two live versions with potentially different attribute values in the cache. Thanks for your thoughts, ---Marc ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
