Re: Repair failure under 0.8.6

Peter Schuller Sun, 04 Dec 2011 13:30:09 -0800

> I will try to increase phi_convict -- I will just need to restart the
> cluster after
> the edit, right?


You will need to restart the nodes for which you want the phi convict
threshold to be different. You might want to do on e.g. half of the
cluster to do A/B testing.

> I do recall that I see nodes temporarily marked as down, only to pop up
> later.

I recommend grepping through the logs on all the clusters (e.g., cat
/var/log/cassandra/cassandra.log | grep UP | wc -l). That should tell
you quickly whether they all seem to be seeing roughly as many node
flaps, or whether some particular node or set of nodes is/are
over-represented.

Next, look at the actual nodes flapping (remove wc -l) and see if all
nodes are flapping or if it is a single node, or a subset of the nodes
(e.g., sharing a switch perhaps).

> In the current situation, there is no load on the cluster at all, outside
> the
> maintenance like the repair.

Ok. So what i'm getting at then is that there may be real legitimate
connectivity problems that you aren't noticing in any other way since
you don't have active traffic to the cluster.


-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Repair failure under 0.8.6

Reply via email to