There are significant performance challenges to having millions of
tiny (<100 bytes) objects no matter what type of caching is used.  If
the OP were to pursue a memcached path he should take into
consideration that he may be better off grouping the data into larger
objects.

On May 12, 2:37 pm, Henrik Schröder <[email protected]> wrote:
> I would actually say that it's not the size of the data that's relevant
> here, but you're saying that you don't need a synchronized cache, and you
> also said that generating the cached data is fairly cheap, so in your case
> local caches with short expiration times would be much better for you.
>
> Memcached will work fine for you, it will be much faster than going to the
> db every time, but from your description your application can't really
> benefit from the main points of memcached, i.e. that it is a distributed
> synchronized cache. It works best when you have objects that are expensive
> to create, but can be cached for a long time and properly invalidated or
> updated in the cache, and where you need the cache to be synchronized.
>
> I would definitely try a local cache as well in your case and measure that
> against your memcached implementation.
>
> /Henrik
>
> On Tue, May 12, 2009 at 20:58, JonB <[email protected]> wrote:
>
> > On May 12, 6:31 pm, Henrik Schröder <[email protected]> wrote:
> > > Yes, but how would you do cache invalidations?
>
> > > Right now in your existing application, you don't have to worry about
> > when
> > > the underlying data changes, since all reads go to the one and only
> > storage
> > > of this data. But if you add a cache layer, you have to start worrying
> > about
> > > cache invalidation. If you have a local cache, and one machine changes
> > the
> > > underlying data, you somehow have to tell all your other machines to
> > refresh
> > > parts of their caches as well. How would you do that? How much would that
> > > mechanism cost?
>
> > If the data was cached locally by the application, it would use the
> > same system as memcached, i.e. the data is given an 'expire by time'
> > when it should be discarded - on that machine. This will obviously be
> > different on each machine, so each machine would discard it at
> > different times, but so long as the data is invalidated, say 5 minutes
> > after it was last read - that isn't an issue.
>
> > The application doesn't care if 'Machine A' changes data that's in
> > 'Machine B's cache - so long as it knows 'Machine B's cache will only
> > persist for 5 minutes.
>
> > > On the other hand, if you use memcached, the cache is shared, so if one
> > > machine changes what's in memcached, all other machines will pick up the
> > new
> > > values immediately. This way, updates are cheaper, but at a higher read
> > > cost. What's the ratio of updates/reads in your application?
>
> > > Or is cache invalidation irrelevant for your application? Will it run
> > fine
> > > with stale data?
>
> > It's not quite irrelevant, it's just 'very tolerant' of stale data.
>
> > At the end of the day it comes down to:
>
> >  - Do I make a single function call, to return a few bytes of data
> > from a 'global slab' of RAM on the local machine.
>
> > or,
>
> >  - Do I make a memcached call to retrieve what could be a few bytes of
> > data, from the memcache 'system'.
>
> > Both will do what I want - I guess what I'm looking for is for someone
> > to say either:
>
> > "It sounds like, for the small data sets your working with there won't
> > be much benefit to using memcache - when you take into account the
> > processing/network overheads etc."
>
> > or,
>
> > "The overheads aren't that great - they'll still be a huge amount less
> > that hitting the MySQL server, they'll still save the server for more
> > write bandwidth, and at least it'll scale and give a coherent cache
> > view across 'n' number of machines".
>
> > I think I'll just install it - see how easy it is to integrate, and
> > then run some test loads through the system and see what it does... I
> > have no doubt it'll make the MySQL server's life easier (shielding it
> > from the bulk of several hundred SELECT's a second) - and if the front
> > end is still spending more time 'doing useful work' than the overhead
> > of using the cache, we'll go with it.
>
> > Phew - I think I just answered the question :)

Reply via email to