On Tue, Feb 23, 2010 at 7:46 PM, Henrik Schröder <[email protected]> wrote:
> Yes, if you use automatic recovery from failover your cache can get > unsynchronized as different parts of your application discover that a > previous failing server is now back up at different points in time. > > If synchronization is very important to your application, make sure you > don't use automatic recovery from failover, and you won't get this problem. > The flipside to that is that when you want to put back servers in the > cluster, you need to restart your application so that all parts of it get > the updated server list at the same time. > > Another way of solving the problem is to not use failover at all. If your > application is fine with more cache misses as long as one of your cache > servers is down, then that solution is the best. You will never have > synchronization problems, and you don't have to restart your application to > bring back servers into the cluster. > Another option is to make your configuration dynamically reloadable. As long as your client code doesn't hold any instance for longer than a request, it should be fairly easy to change at runtime. We built a configuration mechanism in our system such that any app we bring up listens on an "admin configuration" topic in our message queue, so all we need to do is push a new config to all machines and issue a ReloadMcConfig command on the queue. All clients will be dynamically updated with the new config within a few seconds and synchronization problems are basically nonexistent. In reality though, I have never seen a memcached server crash in production. > It is very, very stable. Normally you don't have to worry about what happens > if one server goes down, because they never do. > Yes, it's very, very rare that memcached fails-- I don't know that memcached itself has ever actually crashed on us in production, though we have had hardware failures or the like. More often than not, we only use this mechanism to add new machines to the queue or to alter pool setups. As an aside, is there an FAQ entry anywhere about this synchronization scenario? It seems like almost everybody who's first introduced to memcached jumps through these same mental hoops and thinks they've found a fatal flaw in the design. I feel like it'd be advantageous if there was a help item somewhere that explained how memcached works as well as it does precisely because machines are completely unaware of each other; simplicity and consistency are the keys (pardon the pun) to memcached. -- awl
