We are trying to detect a scenario where some of our smaller clusters go
un-repaired for extended periods of times mostly due to defects in
deployment pipelines or human errors.
We would like to automate a check for clusters where nodes that go
un-repaired for more than 7 days, to shoot out an
Hi,
I'm using Cassandra 2.2.8 with default NTR queque configurations (
max_queued_native_transtport_requests = 128, native_transport_max_threads =
128), and from the metrics I'm seeing some native transport requests are
being blocked.
I'm trying to understand what happens to the blocked native
Hello all,
I’ve been doing more analysis and I’ve few questions:
1. We observed that most of the requests are blocked on NTR queue. I
increased the queue size from 128 (default) to 1024 and this time the system
does recover automatically (latencies go back to normal) without removing node
Hi everyone,
I was finally able to sort out my problem in an "interesting" manner that I
think is worth sharing on the list!
What I did is the following: on each node, I stopped Cassandra, completely
dropped the data files of the column family, started Cassandra again and
issued a repair for