Thanks for your quick reply. Uploading all files. Currently available: https://lasseschou.s3.amazonaws.com/eu-couchbase-02.zip https://lasseschou.s3.amazonaws.com/eu-couchbase-03.zip https://lasseschou.s3.amazonaws.com/eu-couchbase-04.zip https://lasseschou.s3.amazonaws.com/eu-couchbase-07.zip https://lasseschou.s3.amazonaws.com/eu-couchbase-08.zip
...others to come soon, but this should be enough to get a good idea of what's wrong. 2014-03-11 21:57 GMT+01:00 Aliaksey Kandratsenka <[email protected]>: > > > > On Tue, Mar 11, 2014 at 1:40 PM, Lasse Schou <[email protected]> wrote: > >> Hi, >> >> I've been running a Couchbase 2.1 cluster of 8 dedicate machines (Ubuntu >> 12.04, 32GB RAM, 2x3TB HDD) with around 250M docs. It has been stable for >> months, but after upgrading the nodes to 2.2 the cluster suddenly went into >> the following unstable state: >> >> Nodes started going down (real down, can't ping or SSH), up to 6 of the 8 >> machines went down. Restarted the down ones which went into warmup, taking >> 2-3 hours each. Went up again, but other nodes went down, etc, etc. >> >> That situation defeats the purpose of a replicated cluster and causes a >> lot of headache. >> >> I haven't been able to figure out what log to look for crash reports in, >> so I'm hoping you can help with some guidance. It's not the first time I've >> seen a CB cluster react like this. I have another cluster in a different >> data center that exhibited the same behavior until I was finally able to >> add more nodes and stabilize the cluster. The problem there is that I can't >> rebalance one of the buckets - but I'll probably write about that in >> another post. >> >> I can see that some nodes have begun using a lot of swap, even though >> swappiness is set to zero. >> >> First, has anyone seen this behavior before and can you give any pointers >> on where to look for more info? Is this how Couchbase responds to too few >> nodes? >> > > No. I guess that all your nodes are in some sort of border condition. From > you description it appears likely that it's swap storm. > > In order to say more I'll need more data. Feel free to grab cbcollectinfo, > ideally from all nodes, and post it somewhere. Filing jira ticket is one > way to do that. > > -- > You received this message because you are subscribed to the Google Groups > "Couchbase" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Couchbase" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
