Re: Low compactionthroughput blocks reads?

horschi Thu, 03 Mar 2016 06:16:37 -0800

Hi,

I just found another one. Its basically the same, but I'll post it anyway:


Thread 84311: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be
imprecise)
 - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
 - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled
frame)
 -
com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
 -
com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
@bci=10, line=701 (Compiled frame)
 - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
line=405 (Compiled frame)
 - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
@bci=11, line=43 (Compiled frame)
 - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
line=287 (Compiled frame)
 - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
@bci=22, line=65 (Compiled frame)
 -
org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
@bci=5, line=1751 (Compiled frame)
 -
org.apache.cassandra.io.sstable.format.big.SimpleSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
org.apache.cassandra.db.RowIndexEntry,
org.apache.cassandra.io.util.FileDataInput,
org.apache.cassandra.db.composites.Composite) @bci=36, line=57 (Compiled
frame)
 -
org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
org.apache.cassandra.db.RowIndexEntry,
org.apache.cassandra.io.util.FileDataInput,
org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=38, line=66
(Compiled frame)
 -
org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
org.apache.cassandra.db.DecoratedKey,
org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
(Compiled frame)
 -
org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
(Compiled frame)
 -
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame)
 -
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
@bci=9, line=62 (Compiled frame)
 - org.apache.cassandra.db.CollationController.collectAllData(boolean)
@bci=350, line=270 (Compiled frame)
 - org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
@bci=39, line=64 (Compiled frame)
 -
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
int) @bci=40, line=2011 (Compiled frame)
 -
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
@bci=141, line=1815 (Compiled frame)
 -
org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
@bci=11, line=360 (Compiled frame)
 -
org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
@bci=222, line=85 (Compiled frame)
 -
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()
@bci=16, line=1587 (Compiled frame)
 - org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()
@bci=37, line=2232 (Compiled frame)
 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
(Compiled frame)
 -
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
@bci=5, line=164 (Compiled frame)
 - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105
(Compiled frame)
 - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)

kind regards,
Christian


On Thu, Mar 3, 2016 at 2:31 PM, horschi <hors...@gmail.com> wrote:

> Hi,
>
> we just had such an incident again, and this time we were able to grab a
> jstack-threaddump.
>
> One thread from this stackdump actually shows a ReadVerbHandler calling
> the RateLimiter. From my understanding this should not be:
>
> Thread 30346: (state = BLOCKED)
>  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be
> imprecise)
>  - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
>  - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled
> frame)
>  -
> com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
> java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
>  -
> com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
> @bci=10, line=701 (Compiled frame)
>  - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
> line=405 (Compiled frame)
>  - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
> @bci=11, line=43 (Compiled frame)
>  - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
> line=287 (Compiled frame)
>  - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
> @bci=22, line=65 (Compiled frame)
>  -
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
> @bci=5, line=1751 (Compiled frame)
>  -
> org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.setToRowStart(org.apache.cassandra.db.RowIndexEntry,
> org.apache.cassandra.io.util.FileDataInput) @bci=13, line=108 (Compiled
> frame)
>  -
> org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.RowIndexEntry,
> org.apache.cassandra.io.util.FileDataInput,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=110, line=84
> (Compiled frame)
>  -
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.RowIndexEntry,
> org.apache.cassandra.io.util.FileDataInput,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=54, line=66
> (Compiled frame)
>  -
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.DecoratedKey,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
> (Compiled frame)
>  -
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
> (Compiled frame)
>  -
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame)
>  -
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
> @bci=9, line=62 (Compiled frame)
>  - org.apache.cassandra.db.CollationController.collectAllData(boolean)
> @bci=350, line=270 (Compiled frame)
>  - org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
> @bci=39, line=64 (Compiled frame)
>  -
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
> int) @bci=40, line=2011 (Compiled frame)
>  -
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
> @bci=141, line=1815 (Compiled frame)
>  -
> org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
> @bci=11, line=360 (Compiled frame)
>  -
> org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
> @bci=222, line=85 (Compiled frame)
>  -
> org.apache.cassandra.db.ReadVerbHandler.doVerb(org.apache.cassandra.net.MessageIn,
> int) @bci=39, line=38 (Compiled frame)
>  - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=82, line=67
> (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>  -
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
> @bci=5, line=164 (Compiled frame)
>  -
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run()
> @bci=12, line=136 (Compiled frame)
>  - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105
> (Compiled frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
>
>
> Dominik and I would appreciate if you could give some feedback. We think
> it is a bug :-)
>
> Cheers,
> Christian
>
>
> On Thu, Mar 3, 2016 at 1:08 PM, Dominik Keil <dominik.k...@movilizer.com>
> wrote:
>
>> Hi,
>>
>> seeing this here as well. Basically at some point during a repair within
>> the cluster (incremental, parallel, partitioner range, only one node at a
>> time) some node (sometimes the repairing node, sometimes another) starts
>> piling up READs, i.e. pending reads go through the roof. Despite only one
>> node affected at a time and enough other, well behaving nodes, are
>> available to satisfy our quorum reads, this impacts the read performance of
>> the whole cluster.
>>
>> We have multiple C* 2.2 (2.2.5) clusters, basically running the same
>> application but with different amount of load. This symptom only appears on
>> one of our clusters, which has significantly more usage than most others.
>> This is also our largest cluster, having about 3 times as many machines as
>> most other ones (and RF 5 instead of RF 3).
>>
>> We did not see this before 2.0 and also only started to see this on that
>> particular cluster. We didn't see this on any other cluster after upgrading
>> form 2.0 (but then again, they're not as loaded).
>>
>> At first we suspected the incremental repair, because we did have issues
>> with that as well with heap pressure with 4GB of heap. Went to 6GB and no
>> more heap pressure but the problem persists. The suspect does not change as
>> this bad behaviour coincides with repair and specifically with
>> anticompaction going on.
>>
>> We see a clear timely correlation between open file handles rising and at
>> the same time active validations skyrocketing and the beginning of such an
>> "event". Same goes for the end of that event, which is clearly timely
>> correlated to the validations being completed and the number of file
>> handles dropping dramatically. However, these "beginning markers" seem to
>> be both present. Active validations skyrocketing without open file handlers
>> skyrocketing at the same time does not produce these symptoms.
>>
>> The only thing we have found so far that made a difference in these
>> situations is compaction throughput. When we decreased the compaction
>> throughput in these events, pending reads piled up even more and even more
>> quickly. Beyond that we're still pretty much in the dark. Anyway something
>> is locking up Cassandra internally.
>>
>> We suspect that there may be a "synchronized" somewhere it shouldn't be
>> (or should be solved differently) but that's just a guess. We'll try to
>> produce some jstacks but the events are pretty elusive because they happen
>> suddenly and don't last very long (except when we're not watching closely
>> -_-).
>>
>> Anyway, here's some graphs to illustrate what I've tried to describe:
>>
>> (1) CPU Usage of Cassandra (green) and open file descriptors (yellow,
>> second y-axis)
>>
>>
>> (2) Threads: new threads per second (orange line, second y-axis)
>>
>>
>> (3) See below; active and pending tasks on second y-axis
>>
>>
>> (4) Compactions and Validations: Active Tasks (Bars, second y-axis) and
>> Completed Tasks / s (lines)
>>
>>
>> You can see around 08:03 an event started with a sudden rise in active
>> validations and multiple sudden increases in open file descriptors. The
>> event lasts until 08:46 with a sudden drop in open file descriptors and a
>> huge peak in new threads per second.
>>
>> During the event you can see Cassandra's CPU usage drops significantly.
>> Same goes for GC activity (graph not included here, because STW GC only
>> happens about once every 50 minutes and then takes only a fraction of a
>> second).
>>
>> As you can see there's another such event later on but much smaller and
>> shorter and between the events the pattern with the validations continues
>> the same way without problems - only difference: No significant change in
>> open file descriptor count.
>>
>> I have system graphs as well but not included because they show no
>> problems: CPU usage goes down during that event, no I/O wait on the CPU and
>> disk OP/s as well as throughput actually go down as well.
>>
>> During the depicted time frame there was a repair (incremental, parallel,
>> partitioner range) running on a different machine within the cluster. We've
>> switched back to -pr because when running it without -pr these event happen
>> more often and more articulated but I think that it's just affected by the
>> same underlying problem.
>>
>> Interestingly we had a similar issue in another cluster last night, which
>> runs C* 2.1.13 and does NOT yet use incremental repair (just full repair
>> with -pr).
>>
>> Any chance something in the read path is affected by the set compaction
>> throughput and/or running compactions? It definitely seems that Cassandra
>> is severly restricting itself here.
>>
>> Best regards,
>> Dominik
>>
>>
>> Am 26.02.2016 um 17:42 schrieb horschi:
>>
>> Hi,
>>
>> I just had a weird behaviour on one of our Cassandra nodes, which I would
>> like to share:
>>
>> Short version:
>> My pending reads went up from ~0 to the hundreds when I reduced the
>> compactionthroughput from 16 to 2.
>>
>>
>> Long version:
>>
>> One of our more powerful nodes had a few pending reads, while the other
>> ones didn't. So far nothing special.
>>
>> Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was
>> particularly high. So I was wondering.
>>
>> That machine had two compactions and a validation(incremental) running,
>> so I set the compactionthroughput to 2. To my surprise I saw the pending
>> reads go up to the hundreds within 5-10 seconds. Setting the
>> compactionthroughput back to 16 and the pending reads went back to 0 (or at
>> least close to zero).
>>
>> I kept the compactionthroughput on 2 for less than a minute. So the issue
>> is not compactions falling behind.
>>
>> I was able to reproduce this behaviour 5-10 times. The pending reads went
>> up, everytime I *de*creased the compactionthroughput. I watched the
>> pending reads while the compactionthroughput was on 16, and I never
>> observed even a two digit pending read count while it was on
>> compactionthroughput 16.
>>
>> Unfortunetaly the machine does not show this behaviour any more. Also it
>> was only a single machine.
>>
>>
>>
>> Our setup:
>> C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap
>>
>>
>> My question:
>> Did someone else ever observe such a behaviour?
>>
>> Is it perhaps possible that the read-path shares a lock with
>> repair/compaction that waits on ThrottledReader while holding that lock?
>>
>>
>> kind regards,
>> Christian
>>
>>
>> --
>> *Dominik Keil*
>> Phone: + 49 (0) 621 150 207 31
>> Mobile: + 49 (0) 151 626 602 14
>>
>> Movilizer GmbH
>> Julius-Hatry-Strasse 1
>> 68163 Mannheim
>> Germany
>>
>> movilizer.com
>>
>> [image: Visit company website] <http://movilizer.com/>
>> *Reinvent Your Mobile Enterprise*
>>
>> <http://movilizer.com/training>
>> <http://movilizer.com/training>
>>
>> *Be the first to know:*
>> Twitter <https://twitter.com/Movilizer> | LinkedIn
>> <https://www.linkedin.com/company/movilizer-gmbh> | Facebook
>> <https://www.facebook.com/Movilizer> | stack overflow
>> <http://stackoverflow.com/questions/tagged/movilizer>
>>
>> Company's registered office: Mannheim HRB: 700323 / Country Court:
>> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
>> Please inform us immediately if this e-mail and/or any attachment was
>> transmitted incompletely or was not intelligible.
>>
>> This e-mail and any attachment is for authorized use by the intended
>> recipient(s) only. It may contain proprietary material, confidential
>> information and/or be subject to legal privilege. It should not be
>> copied, disclosed to, retained or used by any other party. If you are not
>> an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender.
>
>
>

Re: Low compactionthroughput blocks reads?

Reply via email to