On Jun 18, 2008, at 23:02, Daniel wrote:
I have a couple other ideas I'll share in case anyone likes them...
1.) Let the application decide when an object is "too stale."
Currently memcached is setup so an expired element is never available,
even though it's still in memory.
Perhaps another way memcached could work is to report pack, with the
data, the age of the data. Then the app can decide if that is too old
or not, based on it's needs, and refresh as necessary.
This is one of the processes that was described already.
Adding further metadata to the cache and changing the protocol to
return it isn't really an option at this point, but it's easy to add
to the data in your application.
2.) Rather than having a dog pile, you could set a magic "I'm getting
that" which is written to the cache on a miss. (best if it's even part
of the original get request actually) Other processes, rather than
jumping to the database just wait in a loop with some random timeouts
calling memcached repeatedly until the data is available.
You can do that today with a derived key.
Now, for the big question I've been sitting on for months...
Has anyone worked out a system using memcached that guarantee's
memcached has the most recent data? Essentially, tying memcached to
the
database so memcached is never allowed to contain any stale data?
Yes, many people have done things like this.
Well, guarantee is actually a bit of a difficult thing to say because
it's not like you get 2pc or anything, but I used to have an
application that would push cache updates through as part of DB
updates. I'd actually push the cache updates through *before* the DB
writes because the DB writes were async and conflicts were resolvable.
If not that, then you can at least have a post-transaction cache
replacement (cache_fu in rails supports this out of the box, and I've
build similar things for java a few times).
I've looked at some soutions involving a timestamp with every
record, a
revision code, database row locking, etc. I think I've determined
that
it can be made to work work with the data itself, by disabiling
caching
when multiple writes are being processed, however I was hoping to find
out if anyone's actually made it work.
Why would you disable caching just because something's writing?
There's always a last write.
Having a version column (I used to call it a ``write token'' or
something like that) ensures that you are writing against the correct
data.
One option to ensure correctness to always read from the DB when
doing a write and do a three way merge between the state you were
originally in, the state you were trying to push and the state of the
records in the DB currently. It all depends on what you're doing.
From what I understand, a system like this can only work if every
application that accesses data does it part, but I haven't seen any
proven examples, and it seems to be a highly complex interface that
would require some real amazing programming magic.
I built a lock server I do similar things with. You can create cross
machine locks to mutually exclude operations that need to be
serialized across multiple systems (e.g. I use it for async jobs that
perform search index update and propagation). It's not meant to be
hugely fast, so I wouldn't do it for every single row, but I haven't
found a need for such a thing yet.
Most of this is just tired ramblings of someone guessing at
requirements, though. Once there are particular constraints for an
application, the mechanism to ensure correctness becomes more clear.
--
Dustin Sallings