Hi, seeing this here as well. Basically at some point during a repair within the cluster (incremental, parallel, partitioner range, only one node at a time) some node (sometimes the repairing node, sometimes another) starts piling up READs, i.e. pending reads go through the roof. Despite only one node affected at a time and enough other, well behaving nodes, are available to satisfy our quorum reads, this impacts the read performance of the whole cluster.
We have multiple C* 2.2 (2.2.5) clusters, basically running the same application but with different amount of load. This symptom only appears on one of our clusters, which has significantly more usage than most others. This is also our largest cluster, having about 3 times as many machines as most other ones (and RF 5 instead of RF 3). We did not see this before 2.0 and also only started to see this on that particular cluster. We didn't see this on any other cluster after upgrading form 2.0 (but then again, they're not as loaded). At first we suspected the incremental repair, because we did have issues with that as well with heap pressure with 4GB of heap. Went to 6GB and no more heap pressure but the problem persists. The suspect does not change as this bad behaviour coincides with repair and specifically with anticompaction going on. We see a clear timely correlation between open file handles rising and at the same time active validations skyrocketing and the beginning of such an "event". Same goes for the end of that event, which is clearly timely correlated to the validations being completed and the number of file handles dropping dramatically. However, these "beginning markers" seem to be both present. Active validations skyrocketing without open file handlers skyrocketing at the same time does not produce these symptoms. The only thing we have found so far that made a difference in these situations is compaction throughput. When we decreased the compaction throughput in these events, pending reads piled up even more and even more quickly. Beyond that we're still pretty much in the dark. Anyway something is locking up Cassandra internally. We suspect that there may be a "synchronized" somewhere it shouldn't be (or should be solved differently) but that's just a guess. We'll try to produce some jstacks but the events are pretty elusive because they happen suddenly and don't last very long (except when we're not watching closely -_-). Anyway, here's some graphs to illustrate what I've tried to describe: (1) CPU Usage of Cassandra (green) and open file descriptors (yellow, second y-axis) (2) Threads: new threads per second (orange line, second y-axis) (3) See below; active and pending tasks on second y-axis (4) Compactions and Validations: Active Tasks (Bars, second y-axis) and Completed Tasks / s (lines) You can see around 08:03 an event started with a sudden rise in active validations and multiple sudden increases in open file descriptors. The event lasts until 08:46 with a sudden drop in open file descriptors and a huge peak in new threads per second. During the event you can see Cassandra's CPU usage drops significantly. Same goes for GC activity (graph not included here, because STW GC only happens about once every 50 minutes and then takes only a fraction of a second). As you can see there's another such event later on but much smaller and shorter and between the events the pattern with the validations continues the same way without problems - only difference: No significant change in open file descriptor count. I have system graphs as well but not included because they show no problems: CPU usage goes down during that event, no I/O wait on the CPU and disk OP/s as well as throughput actually go down as well. During the depicted time frame there was a repair (incremental, parallel, partitioner range) running on a different machine within the cluster. We've switched back to -pr because when running it without -pr these event happen more often and more articulated but I think that it's just affected by the same underlying problem. Interestingly we had a similar issue in another cluster last night, which runs C* 2.1.13 and does NOT yet use incremental repair (just full repair with -pr). Any chance something in the read path is affected by the set compaction throughput and/or running compactions? It definitely seems that Cassandra is severly restricting itself here. Best regards, Dominik Am 26.02.2016 um 17:42 schrieb horschi: > Hi, > > I just had a weird behaviour on one of our Cassandra nodes, which I > would like to share: > > Short version: > My pending reads went up from ~0 to the hundreds when I reduced the > compactionthroughput from 16 to 2. > > > Long version: > > One of our more powerful nodes had a few pending reads, while the > other ones didn't. So far nothing special. > > Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was > particularly high. So I was wondering. > > That machine had two compactions and a validation(incremental) > running, so I set the compactionthroughput to 2. To my surprise I saw > the pending reads go up to the hundreds within 5-10 seconds. Setting > the compactionthroughput back to 16 and the pending reads went back to > 0 (or at least close to zero). > > I kept the compactionthroughput on 2 for less than a minute. So the > issue is not compactions falling behind. > > I was able to reproduce this behaviour 5-10 times. The pending reads > went up, everytime I *de*creased the compactionthroughput. I watched > the pending reads while the compactionthroughput was on 16, and I > never observed even a two digit pending read count while it was on > compactionthroughput 16. > > Unfortunetaly the machine does not show this behaviour any more. Also > it was only a single machine. > > > > Our setup: > C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap > > > My question: > Did someone else ever observe such a behaviour? > > Is it perhaps possible that the read-path shares a lock with > repair/compaction that waits on ThrottledReader while holding that lock? > > > kind regards, > Christian -- *Dominik Keil* Phone: + 49 (0) 621 150 207 31 Mobile: + 49 (0) 151 626 602 14 Movilizer GmbH Julius-Hatry-Strasse 1 68163 Mannheim Germany -- movilizer.com [image: Visit company website] <http://movilizer.com/> *Reinvent Your Mobile Enterprise* <http://movilizer.com/training> <http://movilizer.com/training> *Be the first to know:* Twitter <https://twitter.com/Movilizer> | LinkedIn <https://www.linkedin.com/company/movilizer-gmbh> | Facebook <https://www.facebook.com/Movilizer> | stack overflow <http://stackoverflow.com/questions/tagged/movilizer> Company's registered office: Mannheim HRB: 700323 / Country Court: Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche Please inform us immediately if this e-mail and/or any attachment was transmitted incompletely or was not intelligible. This e-mail and any attachment is for authorized use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender.