I can't say that I have tried that while the issue is going on, but I have done such rolling restarts for sure, and the timeouts still occur every day. What would a rolling restart do to fix the issue?
In fact, as I write this, I am restarting each node one by one in the eu-west-1 datacenter, and in us-east-1 I am seeing lots of timeouts - both the metrics 'Connection.TotalTimeouts.m1_rate' and 'ClientRequest.Latency.Read.p999' flatlining at ~6s. Why would restarting in one datacenter impact reads in another? Any suggestions on what to investigate next, or what changes to try in the cluster? Happy to provide any more info as well :) On Fri, Feb 17, 2017 at 6:05 AM, kurt greaves <k...@instaclustr.com> wrote: > have you tried a rolling restart of the entire DC? >