There are significant performance challenges to having millions of tiny (<100 bytes) objects no matter what type of caching is used. If the OP were to pursue a memcached path he should take into consideration that he may be better off grouping the data into larger objects.
On May 12, 2:37 pm, Henrik Schröder <[email protected]> wrote: > I would actually say that it's not the size of the data that's relevant > here, but you're saying that you don't need a synchronized cache, and you > also said that generating the cached data is fairly cheap, so in your case > local caches with short expiration times would be much better for you. > > Memcached will work fine for you, it will be much faster than going to the > db every time, but from your description your application can't really > benefit from the main points of memcached, i.e. that it is a distributed > synchronized cache. It works best when you have objects that are expensive > to create, but can be cached for a long time and properly invalidated or > updated in the cache, and where you need the cache to be synchronized. > > I would definitely try a local cache as well in your case and measure that > against your memcached implementation. > > /Henrik > > On Tue, May 12, 2009 at 20:58, JonB <[email protected]> wrote: > > > On May 12, 6:31 pm, Henrik Schröder <[email protected]> wrote: > > > Yes, but how would you do cache invalidations? > > > > Right now in your existing application, you don't have to worry about > > when > > > the underlying data changes, since all reads go to the one and only > > storage > > > of this data. But if you add a cache layer, you have to start worrying > > about > > > cache invalidation. If you have a local cache, and one machine changes > > the > > > underlying data, you somehow have to tell all your other machines to > > refresh > > > parts of their caches as well. How would you do that? How much would that > > > mechanism cost? > > > If the data was cached locally by the application, it would use the > > same system as memcached, i.e. the data is given an 'expire by time' > > when it should be discarded - on that machine. This will obviously be > > different on each machine, so each machine would discard it at > > different times, but so long as the data is invalidated, say 5 minutes > > after it was last read - that isn't an issue. > > > The application doesn't care if 'Machine A' changes data that's in > > 'Machine B's cache - so long as it knows 'Machine B's cache will only > > persist for 5 minutes. > > > > On the other hand, if you use memcached, the cache is shared, so if one > > > machine changes what's in memcached, all other machines will pick up the > > new > > > values immediately. This way, updates are cheaper, but at a higher read > > > cost. What's the ratio of updates/reads in your application? > > > > Or is cache invalidation irrelevant for your application? Will it run > > fine > > > with stale data? > > > It's not quite irrelevant, it's just 'very tolerant' of stale data. > > > At the end of the day it comes down to: > > > - Do I make a single function call, to return a few bytes of data > > from a 'global slab' of RAM on the local machine. > > > or, > > > - Do I make a memcached call to retrieve what could be a few bytes of > > data, from the memcache 'system'. > > > Both will do what I want - I guess what I'm looking for is for someone > > to say either: > > > "It sounds like, for the small data sets your working with there won't > > be much benefit to using memcache - when you take into account the > > processing/network overheads etc." > > > or, > > > "The overheads aren't that great - they'll still be a huge amount less > > that hitting the MySQL server, they'll still save the server for more > > write bandwidth, and at least it'll scale and give a coherent cache > > view across 'n' number of machines". > > > I think I'll just install it - see how easy it is to integrate, and > > then run some test loads through the system and see what it does... I > > have no doubt it'll make the MySQL server's life easier (shielding it > > from the bulk of several hundred SELECT's a second) - and if the front > > end is still spending more time 'doing useful work' than the overhead > > of using the cache, we'll go with it. > > > Phew - I think I just answered the question :)
