On Tue, Mar 11, 2014 at 1:40 PM, Lasse Schou <[email protected]> wrote:

> Hi,
>
> I've been running a Couchbase 2.1 cluster of 8 dedicate machines (Ubuntu
> 12.04, 32GB RAM, 2x3TB HDD) with around 250M docs. It has been stable for
> months, but after upgrading the nodes to 2.2 the cluster suddenly went into
> the following unstable state:
>
> Nodes started going down (real down, can't ping or SSH), up to 6 of the 8
> machines went down. Restarted the down ones which went into warmup, taking
> 2-3 hours each. Went up again, but other nodes went down, etc, etc.
>
> That situation defeats the purpose of a replicated cluster and causes a
> lot of headache.
>
> I haven't been able to figure out what log to look for crash reports in,
> so I'm hoping you can help with some guidance. It's not the first time I've
> seen a CB cluster react like this. I have another cluster in a different
> data center that exhibited the same behavior until I was finally able to
> add more nodes and stabilize the cluster. The problem there is that I can't
> rebalance one of the buckets - but I'll probably write about that in
> another post.
>
> I can see that some nodes have begun using a lot of swap, even though
> swappiness is set to zero.
>
> First, has anyone seen this behavior before and can you give any pointers
> on where to look for more info? Is this how Couchbase responds to too few
> nodes?
>

No. I guess that all your nodes are in some sort of border condition. From
you description it appears likely that it's swap storm.

In order to say more I'll need more data. Feel free to grab cbcollectinfo,
ideally from all nodes, and post it somewhere. Filing jira ticket is one
way to do that.

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to