On May 12, 6:31 pm, Henrik Schröder <[email protected]> wrote:
> Yes, but how would you do cache invalidations?
>
> Right now in your existing application, you don't have to worry about when
> the underlying data changes, since all reads go to the one and only storage
> of this data. But if you add a cache layer, you have to start worrying about
> cache invalidation. If you have a local cache, and one machine changes the
> underlying data, you somehow have to tell all your other machines to refresh
> parts of their caches as well. How would you do that? How much would that
> mechanism cost?
If the data was cached locally by the application, it would use the
same system as memcached, i.e. the data is given an 'expire by time'
when it should be discarded - on that machine. This will obviously be
different on each machine, so each machine would discard it at
different times, but so long as the data is invalidated, say 5 minutes
after it was last read - that isn't an issue.
The application doesn't care if 'Machine A' changes data that's in
'Machine B's cache - so long as it knows 'Machine B's cache will only
persist for 5 minutes.
> On the other hand, if you use memcached, the cache is shared, so if one
> machine changes what's in memcached, all other machines will pick up the new
> values immediately. This way, updates are cheaper, but at a higher read
> cost. What's the ratio of updates/reads in your application?
>
> Or is cache invalidation irrelevant for your application? Will it run fine
> with stale data?
It's not quite irrelevant, it's just 'very tolerant' of stale data.
At the end of the day it comes down to:
- Do I make a single function call, to return a few bytes of data
from a 'global slab' of RAM on the local machine.
or,
- Do I make a memcached call to retrieve what could be a few bytes of
data, from the memcache 'system'.
Both will do what I want - I guess what I'm looking for is for someone
to say either:
"It sounds like, for the small data sets your working with there won't
be much benefit to using memcache - when you take into account the
processing/network overheads etc."
or,
"The overheads aren't that great - they'll still be a huge amount less
that hitting the MySQL server, they'll still save the server for more
write bandwidth, and at least it'll scale and give a coherent cache
view across 'n' number of machines".
I think I'll just install it - see how easy it is to integrate, and
then run some test loads through the system and see what it does... I
have no doubt it'll make the MySQL server's life easier (shielding it
from the bulk of several hundred SELECT's a second) - and if the front
end is still spending more time 'doing useful work' than the overhead
of using the cache, we'll go with it.
Phew - I think I just answered the question :)