Ok, I'll have a look at TokyoTyrant. Do you have numbers comparing the write performance of memcached and tt?
Cheers, Martin On Sun, Mar 14, 2010 at 2:20 AM, Adam Lee <[email protected]> wrote: > If your goal is only to make memcached into a reliable datastore, then I > think you are perhaps going about it in the wrong way. The memcached server > is extremely well written and tuned and does it's job incredibly well and > very efficiently. If you want to ensure that it is deterministic, I think > that you should do your code on the client side rather than on the server > side. > > We, for example, did some work in the past to store our user data (we don't > really use sessions in the traditional sense of the word, but this probably > the closest thing we have outside of our cookie) in memcached because the > load on our primary database was just too high. In order to make it > deterministic, we wrote our own client and did a special setup. > > We had several servers (started with 3, ended up growing it to 5 before we > replaced it with TokyoTyrant) that had identical configurations, such that > each server had more than enough memory to fit the entire dataset. We then > wrote a client that had the following behavior: > > - Writes were sent to every server > - All updates to the database had to also be written to memcached in order > to be considered a success > - Reads were performed on a randomly selected server > > We also wrote a populate-user-cache script that could fill a new server > with the required data. Since we have about 30 million users, this job took > quite a while, so we also built in the idea of an "is populated" flag. This > flag would not be set by the populate script until it was totally finished > replicating the data. The client code was written such that it could write > to a server that didn't have the "is populated" flag, but would never read > from it. This meant that we could bring up new servers and they would be > populated with new data, but only would be used once they were accurate (the > populate-user-cache script only issued add commands, making sure that it > didn't clobber any data being written by actual traffic). > > One of the key features of this setup was that every server had the full > dataset-- this meant that we could build a page that needed data for, say, > 500 users and load it with almost no more latency than needed to get the > data for one user because of how well memcached handles multi-gets. > > We don't use this setup anymore because we moved to using TokyoTyrant as > our persistent cache layer, but I will say that it worked pretty much > flawlessly for about two years. There was no way that our database would > have been able to handle the read necessary read load, but these servers > performed exceedingly well-- easily handling over 30,000+ gets per second. > > Anyway, I think that building something similar might do a much better job > of performing the task you're attempting. The key thing to recognize is > that memcached is built to do a specific task and it's _GREAT_ at it, so you > should use it for what it does best. Let me know if any of this doesn't make > sense to you or if you have any further questions. > > -- > awl > -- Martin Grotzke http://www.javakaffee.de/blog/
