Hi,

I've been running a Couchbase 2.1 cluster of 8 dedicate machines (Ubuntu 
12.04, 32GB RAM, 2x3TB HDD) with around 250M docs. It has been stable for 
months, but after upgrading the nodes to 2.2 the cluster suddenly went into 
the following unstable state:

Nodes started going down (real down, can't ping or SSH), up to 6 of the 8 
machines went down. Restarted the down ones which went into warmup, taking 
2-3 hours each. Went up again, but other nodes went down, etc, etc. 

That situation defeats the purpose of a replicated cluster and causes a lot 
of headache.

I haven't been able to figure out what log to look for crash reports in, so 
I'm hoping you can help with some guidance. It's not the first time I've 
seen a CB cluster react like this. I have another cluster in a different 
data center that exhibited the same behavior until I was finally able to 
add more nodes and stabilize the cluster. The problem there is that I can't 
rebalance one of the buckets - but I'll probably write about that in 
another post.

I can see that some nodes have begun using a lot of swap, even though 
swappiness is set to zero.

First, has anyone seen this behavior before and can you give any pointers 
on where to look for more info? Is this how Couchbase responds to too few 
nodes?

And second, can anyone recommend experts to hire for these sorts of 
situations? Unfortunately we're not in a position where we can afford the 
enterprise solution, but we could sure use some consultancy help now and 
then.

Thanks,
Lasse


-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to