Thanks for your quick reply. Uploading all files. Currently available:
https://lasseschou.s3.amazonaws.com/eu-couchbase-02.zip
https://lasseschou.s3.amazonaws.com/eu-couchbase-03.zip
https://lasseschou.s3.amazonaws.com/eu-couchbase-04.zip
https://lasseschou.s3.amazonaws.com/eu-couchbase-07.zip
https://lasseschou.s3.amazonaws.com/eu-couchbase-08.zip

...others to come soon, but this should be enough to get a good idea of
what's wrong.


2014-03-11 21:57 GMT+01:00 Aliaksey Kandratsenka <[email protected]>:

>
>
>
> On Tue, Mar 11, 2014 at 1:40 PM, Lasse Schou <[email protected]> wrote:
>
>> Hi,
>>
>> I've been running a Couchbase 2.1 cluster of 8 dedicate machines (Ubuntu
>> 12.04, 32GB RAM, 2x3TB HDD) with around 250M docs. It has been stable for
>> months, but after upgrading the nodes to 2.2 the cluster suddenly went into
>> the following unstable state:
>>
>> Nodes started going down (real down, can't ping or SSH), up to 6 of the 8
>> machines went down. Restarted the down ones which went into warmup, taking
>> 2-3 hours each. Went up again, but other nodes went down, etc, etc.
>>
>> That situation defeats the purpose of a replicated cluster and causes a
>> lot of headache.
>>
>> I haven't been able to figure out what log to look for crash reports in,
>> so I'm hoping you can help with some guidance. It's not the first time I've
>> seen a CB cluster react like this. I have another cluster in a different
>> data center that exhibited the same behavior until I was finally able to
>> add more nodes and stabilize the cluster. The problem there is that I can't
>> rebalance one of the buckets - but I'll probably write about that in
>> another post.
>>
>> I can see that some nodes have begun using a lot of swap, even though
>> swappiness is set to zero.
>>
>> First, has anyone seen this behavior before and can you give any pointers
>> on where to look for more info? Is this how Couchbase responds to too few
>> nodes?
>>
>
> No. I guess that all your nodes are in some sort of border condition. From
> you description it appears likely that it's swap storm.
>
> In order to say more I'll need more data. Feel free to grab cbcollectinfo,
> ideally from all nodes, and post it somewhere. Filing jira ticket is one
> way to do that.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Couchbase" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to