If your goal is only to make memcached into a reliable datastore, then I
think you are perhaps going about it in the wrong way.  The memcached server
is extremely well written and tuned and does it's job incredibly well and
very efficiently.  If you want to ensure that it is deterministic, I think
that you should do your code on the client side rather than on the server
side.

We, for example, did some work in the past to store our user data (we don't
really use sessions in the traditional sense of the word, but this probably
the closest thing we have outside of our cookie) in memcached because the
load on our primary database was just too high.  In order to make it
deterministic, we wrote our own client and did a special setup.

We had several servers (started with 3, ended up growing it to 5 before we
replaced it with TokyoTyrant) that had identical configurations, such that
each server had more than enough memory to fit the entire dataset.  We then
wrote a client that had the following behavior:

- Writes were sent to every server
- All updates to the database had to also be written to memcached in order
to be considered a success
- Reads were performed on a randomly selected server

We also wrote a populate-user-cache script that could fill a new server with
the required data. Since we have about 30 million users, this job took quite
a while, so we also built in the idea of an "is populated" flag.  This flag
would not be set by the populate script until it was totally finished
replicating the data.  The client code was written such that it could write
to a server that didn't have the "is populated" flag, but would never read
from it.  This meant that we could bring up new servers and they would be
populated with new data, but only would be used once they were accurate (the
populate-user-cache script only issued add commands, making sure that it
didn't clobber any data being written by actual traffic).

One of the key features of this setup was that every server had the full
dataset-- this meant that we could build a page that needed data for, say,
500 users and load it with almost no more latency than needed to get the
data for one user because of how well memcached handles multi-gets.

We don't use this setup anymore because we moved to using TokyoTyrant as our
persistent cache layer, but I will say that it worked pretty much flawlessly
for about two years.  There was no way that our database would have been
able to handle the read necessary read load, but these servers performed
exceedingly well-- easily handling over 30,000+ gets per second.

Anyway, I think that building something similar might do a much better job
of performing the task you're attempting.  The key thing to recognize is
that memcached is built to do a specific task and it's _GREAT_ at it, so you
should use it for what it does best. Let me know if any of this doesn't make
sense to you or if you have any further questions.

-- 
awl

Reply via email to