Hi, I've been running a Couchbase 2.1 cluster of 8 dedicate machines (Ubuntu 12.04, 32GB RAM, 2x3TB HDD) with around 250M docs. It has been stable for months, but after upgrading the nodes to 2.2 the cluster suddenly went into the following unstable state:
Nodes started going down (real down, can't ping or SSH), up to 6 of the 8 machines went down. Restarted the down ones which went into warmup, taking 2-3 hours each. Went up again, but other nodes went down, etc, etc. That situation defeats the purpose of a replicated cluster and causes a lot of headache. I haven't been able to figure out what log to look for crash reports in, so I'm hoping you can help with some guidance. It's not the first time I've seen a CB cluster react like this. I have another cluster in a different data center that exhibited the same behavior until I was finally able to add more nodes and stabilize the cluster. The problem there is that I can't rebalance one of the buckets - but I'll probably write about that in another post. I can see that some nodes have begun using a lot of swap, even though swappiness is set to zero. First, has anyone seen this behavior before and can you give any pointers on where to look for more info? Is this how Couchbase responds to too few nodes? And second, can anyone recommend experts to hire for these sorts of situations? Unfortunately we're not in a position where we can afford the enterprise solution, but we could sure use some consultancy help now and then. Thanks, Lasse -- You received this message because you are subscribed to the Google Groups "Couchbase" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
