So this problem we face is , every time a node goes down or a node is under
high load or CPU. We see lots of hints piles up and doesn’t apply on the
other nodes. Last time when this happened we noticed, high pending
mutations but when I have gone back and checked the history of events , not
every time we see high pending mutations. So basically high load and cpu
caused high pending mutations however I feel it was not the vice versa.

Using top command it was very clear that Cassandra is the cause of the high
cpu.

Other than too, iostat, iotop what tools use you use to dig into high load
and high cpu issue ?

On Tue, Jan 28, 2020 at 1:12 PM Patrick McFadin <pmcfa...@gmail.com> wrote:

> I would definitely check the IO stats then, If you see latency going over
> 20ms, you need to solve that problem.
>
> Patrick
>
> On Tue, Jan 28, 2020 at 12:01 PM Surbhi Gupta <surbhi.gupt...@gmail.com>
> wrote:
>
>> We have also noticed a lot of MutationStage pending .
>>
>>
>> On Tue, 28 Jan 2020 at 11:06, Richard Andersen <rich...@andersenfamily.us>
>> wrote:
>>
>>> I am in agreement with Patrick, this is a typical symptom of saturated
>>> IO. Are there a high of drops and/or pending compactions?
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>> ------------------------------
>>> *From:* Patrick McFadin <pmcfa...@gmail.com>
>>> *Sent:* Tuesday, January 28, 2020 11:25:49 AM
>>> *To:* user@cassandra.apache.org <user@cassandra.apache.org>
>>> *Subject:* Re: How to read content of hints file and apply them
>>> manually?
>>>
>>> Just to add in here. Any time I see any hints on a cluster, that's like
>>> seeing smoke. If you can't explain it, you have a fire somewhere and it's
>>> not going to get any better.
>>>
>>> By the few messages I've seen, I would start by looking at your IO
>>> subsystem on your nodes. Do you have enough throughput to write and read at
>>> the same time? These are exactly the symptoms I see when running Cassandra
>>> on a SAN or NAS.
>>>
>>> Patrick
>>>
>>> On Mon, Jan 27, 2020 at 8:17 PM Surbhi Gupta <surbhi.gupt...@gmail.com>
>>> wrote:
>>>
>>> We tried to tune sethintedhandoffthrottlekb to 100 , 1024 , 10240 but
>>> nothing helped .
>>> Our hints related parameters are as below, if you don't find any
>>> parameter below then it is not set in our environment and should be of the
>>> default value.
>>>
>>> max_hint_window_in_ms: 10800000 # 3 hours
>>>
>>> hinted_handoff_enabled: true
>>>
>>> hinted_handoff_throttle_in_kb: 100
>>>
>>> max_hints_delivery_threads: 8
>>>
>>> hints_directory: /var/lib/cassandra/hints
>>>
>>> hints_flush_period_in_ms: 10000
>>>
>>> max_hints_file_size_in_mb: 128
>>>
>>> On Mon, 27 Jan 2020 at 18:34, Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>
>>> The high cpu is probably the hints getting replayed slamming the write
>>> path
>>>
>>> Slowing it down with the hint throttle may help
>>>
>>> It’s not instant.
>>>
>>> On Jan 27, 2020, at 6:05 PM, Erick Ramirez <flightc...@gmail.com> wrote:
>>>
>>> 
>>>
>>> Increase the max_hint_window_in_ms setting in cassandra.yaml to more
>>> than 3 hours, perhaps 6 hours. If the issue still persists networking may
>>> need to be tested for bandwidth issues.
>>>
>>>
>>> Just a note of warning about bumping up the hint window without
>>> understanding the pros and cons. Be aware that doubling it means:
>>>
>>>    - you'll end up doubling the size of stored hints in
>>>    the hints_directory
>>>    - there'll be twice as much hints to replay when node(s) come back
>>>    online
>>>
>>> There's always 2 sides to fiddling with the knobs in C*. Cheers!
>>>
>>>

Reply via email to