Hi,

seeing this here as well. Basically at some point during a repair within
the cluster (incremental, parallel, partitioner range, only one node at
a time) some node (sometimes the repairing node, sometimes another)
starts piling up READs, i.e. pending reads go through the roof. Despite
only one node affected at a time and enough other, well behaving nodes,
are available to satisfy our quorum reads, this impacts the read
performance of the whole cluster.

We have multiple C* 2.2 (2.2.5) clusters, basically running the same
application but with different amount of load. This symptom only appears
on one of our clusters, which has significantly more usage than most
others. This is also our largest cluster, having about 3 times as many
machines as most other ones (and RF 5 instead of RF 3).

We did not see this before 2.0 and also only started to see this on that
particular cluster. We didn't see this on any other cluster after
upgrading form 2.0 (but then again, they're not as loaded).

At first we suspected the incremental repair, because we did have issues
with that as well with heap pressure with 4GB of heap. Went to 6GB and
no more heap pressure but the problem persists. The suspect does not
change as this bad behaviour coincides with repair and specifically with
anticompaction going on.

We see a clear timely correlation between open file handles rising and
at the same time active validations skyrocketing and the beginning of
such an "event". Same goes for the end of that event, which is clearly
timely correlated to the validations being completed and the number of
file handles dropping dramatically. However, these "beginning markers"
seem to be both present. Active validations skyrocketing without open
file handlers skyrocketing at the same time does not produce these symptoms.

The only thing we have found so far that made a difference in these
situations is compaction throughput. When we decreased the compaction
throughput in these events, pending reads piled up even more and even
more quickly. Beyond that we're still pretty much in the dark. Anyway
something is locking up Cassandra internally.

We suspect that there may be a "synchronized" somewhere it shouldn't be
(or should be solved differently) but that's just a guess. We'll try to
produce some jstacks but the events are pretty elusive because they
happen suddenly and don't last very long (except when we're not watching
closely -_-).

Anyway, here's some graphs to illustrate what I've tried to describe:

(1) CPU Usage of Cassandra (green) and open file descriptors (yellow,
second y-axis)


(2) Threads: new threads per second (orange line, second y-axis)


(3) See below; active and pending tasks on second y-axis


(4) Compactions and Validations: Active Tasks (Bars, second y-axis) and
Completed Tasks / s (lines)


You can see around 08:03 an event started with a sudden rise in active
validations and multiple sudden increases in open file descriptors. The
event lasts until 08:46 with a sudden drop in open file descriptors and
a huge peak in new threads per second.

During the event you can see Cassandra's CPU usage drops significantly.
Same goes for GC activity (graph not included here, because STW GC only
happens about once every 50 minutes and then takes only a fraction of a
second).

As you can see there's another such event later on but much smaller and
shorter and between the events the pattern with the validations
continues the same way without problems - only difference: No
significant change in open file descriptor count.

I have system graphs as well but not included because they show no
problems: CPU usage goes down during that event, no I/O wait on the CPU
and disk OP/s as well as throughput actually go down as well.

During the depicted time frame there was a repair (incremental,
parallel, partitioner range) running on a different machine within the
cluster. We've switched back to -pr because when running it without -pr
these event happen more often and more articulated but I think that it's
just affected by the same underlying problem.

Interestingly we had a similar issue in another cluster last night,
which runs C* 2.1.13 and does NOT yet use incremental repair (just full
repair with -pr).

Any chance something in the read path is affected by the set compaction
throughput and/or running compactions? It definitely seems that
Cassandra is severly restricting itself here.

Best regards,
Dominik

Am 26.02.2016 um 17:42 schrieb horschi:
> Hi,
>
> I just had a weird behaviour on one of our Cassandra nodes, which I
> would like to share:
>
> Short version:
> My pending reads went up from ~0 to the hundreds when I reduced the
> compactionthroughput from 16 to 2.
>
>
> Long version:
>
> One of our more powerful nodes had a few pending reads, while the
> other ones didn't. So far nothing special.
>
> Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was
> particularly high. So I was wondering.
>
> That machine had two compactions and a validation(incremental)
> running, so I set the compactionthroughput to 2. To my surprise I saw
> the pending reads go up to the hundreds within 5-10 seconds. Setting
> the compactionthroughput back to 16 and the pending reads went back to
> 0 (or at least close to zero).
>
> I kept the compactionthroughput on 2 for less than a minute. So the
> issue is not compactions falling behind.
>
> I was able to reproduce this behaviour 5-10 times. The pending reads
> went up, everytime I *de*creased the compactionthroughput. I watched
> the pending reads while the compactionthroughput was on 16, and I
> never observed even a two digit pending read count while it was on
> compactionthroughput 16.
>
> Unfortunetaly the machine does not show this behaviour any more. Also
> it was only a single machine.
>
>
>
> Our setup:
> C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap
>
>
> My question:
> Did someone else ever observe such a behaviour?
>
> Is it perhaps possible that the read-path shares a lock with
> repair/compaction that waits on ThrottledReader while holding that lock?
>
>
> kind regards,
> Christian

-- 
*Dominik Keil*
Phone: + 49 (0) 621 150 207 31
Mobile: + 49 (0) 151 626 602 14

Movilizer GmbH
Julius-Hatry-Strasse 1
68163 Mannheim
Germany

-- 
movilizer.com

[image: Visit company website] <http://movilizer.com/>
*Reinvent Your Mobile Enterprise*

<http://movilizer.com/training>
<http://movilizer.com/training>

*Be the first to know:*
Twitter <https://twitter.com/Movilizer> | LinkedIn 
<https://www.linkedin.com/company/movilizer-gmbh> | Facebook 
<https://www.facebook.com/Movilizer> | stack overflow 
<http://stackoverflow.com/questions/tagged/movilizer>

Company's registered office: Mannheim HRB: 700323 / Country Court: Mannheim 
Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche Please 
inform us immediately if this e-mail and/or any attachment was transmitted 
incompletely or was not intelligible.

This e-mail and any attachment is for authorized use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be 
copied, disclosed to, retained or used by any other party. If you are not 
an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender.

Reply via email to