Re: How to read content of hints file and apply them manually?

2020-01-28 Thread Erick Ramirez
I would do a thread dump and work out the threads with the highest CPU consumers from it. But in my experience, 90% of the time it's GC from high app traffic unless you've hit an edge case bug. Which means the cluster doesn't have enough capacity and you need to review the cluster size. Cheers!

Re: How to read content of hints file and apply them manually?

2020-01-28 Thread Surbhi Gupta
So this problem we face is , every time a node goes down or a node is under high load or CPU. We see lots of hints piles up and doesn’t apply on the other nodes. Last time when this happened we noticed, high pending mutations but when I have gone back and checked the history of events , not every

Re: How to read content of hints file and apply them manually?

2020-01-28 Thread Patrick McFadin
I would definitely check the IO stats then, If you see latency going over 20ms, you need to solve that problem. Patrick On Tue, Jan 28, 2020 at 12:01 PM Surbhi Gupta wrote: > We have also noticed a lot of MutationStage pending . > > > On Tue, 28 Jan 2020 at 11:06, Richard Andersen > wrote: >

Re: How to read content of hints file and apply them manually?

2020-01-28 Thread Surbhi Gupta
We have also noticed a lot of MutationStage pending . On Tue, 28 Jan 2020 at 11:06, Richard Andersen wrote: > I am in agreement with Patrick, this is a typical symptom of saturated IO. > Are there a high of drops and/or pending compactions? > > Get Outlook for Android >

Re: How to read content of hints file and apply them manually?

2020-01-28 Thread Richard Andersen
I am in agreement with Patrick, this is a typical symptom of saturated IO. Are there a high of drops and/or pending compactions? Get Outlook for Android From: Patrick McFadin Sent: Tuesday, January 28, 2020 11:25:49 AM To:

Re: How to read content of hints file and apply them manually?

2020-01-28 Thread Patrick McFadin
Just to add in here. Any time I see any hints on a cluster, that's like seeing smoke. If you can't explain it, you have a fire somewhere and it's not going to get any better. By the few messages I've seen, I would start by looking at your IO subsystem on your nodes. Do you have enough throughput