Yep, definitely a bug. Introduced by CASSANDRA-9240 (me; mea culpa).

I've filed a JIRA for you: CASSANDRA-11301

On 3 March 2016 at 14:10, horschi <hors...@gmail.com> wrote:

> Hi,
>
> I just found another one. Its basically the same, but I'll post it anyway:
>
> Thread 84311: (state = BLOCKED)
>  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be
> imprecise)
>  - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
>  - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled
> frame)
>  -
>
> com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
> java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
>  -
>
> com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
> @bci=10, line=701 (Compiled frame)
>  - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
> line=405 (Compiled frame)
>  - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
> @bci=11, line=43 (Compiled frame)
>  - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
> line=287 (Compiled frame)
>  - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
> @bci=22, line=65 (Compiled frame)
>  -
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
> @bci=5, line=1751 (Compiled frame)
>  -
>
> org.apache.cassandra.io.sstable.format.big.SimpleSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.RowIndexEntry,
> org.apache.cassandra.io.util.FileDataInput,
> org.apache.cassandra.db.composites.Composite) @bci=36, line=57 (Compiled
> frame)
>  -
>
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.RowIndexEntry,
> org.apache.cassandra.io.util.FileDataInput,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=38, line=66
> (Compiled frame)
>  -
>
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.DecoratedKey,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
> (Compiled frame)
>  -
>
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
> org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
> (Compiled frame)
>  -
>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
> org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame)
>  -
>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
> @bci=9, line=62 (Compiled frame)
>  - org.apache.cassandra.db.CollationController.collectAllData(boolean)
> @bci=350, line=270 (Compiled frame)
>  - org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
> @bci=39, line=64 (Compiled frame)
>  -
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
> int) @bci=40, line=2011 (Compiled frame)
>  -
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
> @bci=141, line=1815 (Compiled frame)
>  -
>
> org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
> @bci=11, line=360 (Compiled frame)
>  -
>
> org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
> @bci=222, line=85 (Compiled frame)
>  -
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()
> @bci=16, line=1587 (Compiled frame)
>  - org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()
> @bci=37, line=2232 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>  -
>
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
> @bci=5, line=164 (Compiled frame)
>  - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105
> (Compiled frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
>
> kind regards,
> Christian
>
>
> On Thu, Mar 3, 2016 at 2:31 PM, horschi <hors...@gmail.com> wrote:
>
> > Hi,
> >
> > we just had such an incident again, and this time we were able to grab a
> > jstack-threaddump.
> >
> > One thread from this stackdump actually shows a ReadVerbHandler calling
> > the RateLimiter. From my understanding this should not be:
> >
> > Thread 30346: (state = BLOCKED)
> >  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may
> be
> > imprecise)
> >  - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
> >  - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled
> > frame)
> >  -
> >
> com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
> > java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
> >  -
> >
> com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
> > @bci=10, line=701 (Compiled frame)
> >  - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
> > line=405 (Compiled frame)
> >  - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
> > @bci=11, line=43 (Compiled frame)
> >  - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
> > line=287 (Compiled frame)
> >  - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
> > @bci=22, line=65 (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
> > @bci=5, line=1751 (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.setToRowStart(org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput) @bci=13, line=108 (Compiled
> > frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.big.IndexedSliceReader.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=110, line=84
> > (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=54, line=66
> > (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.<init>(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.DecoratedKey,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
> > (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
> > (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
> > @bci=9, line=62 (Compiled frame)
> >  - org.apache.cassandra.db.CollationController.collectAllData(boolean)
> > @bci=350, line=270 (Compiled frame)
> >  -
> org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
> > @bci=39, line=64 (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
> > int) @bci=40, line=2011 (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
> > @bci=141, line=1815 (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
> > @bci=11, line=360 (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
> > @bci=222, line=85 (Compiled frame)
> >  -
> >
> org.apache.cassandra.db.ReadVerbHandler.doVerb(org.apache.cassandra.net.MessageIn,
> > int) @bci=39, line=38 (Compiled frame)
> >  - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=82, line=67
> > (Compiled frame)
> >  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> > (Compiled frame)
> >  -
> >
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
> > @bci=5, line=164 (Compiled frame)
> >  -
> >
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run()
> > @bci=12, line=136 (Compiled frame)
> >  - org.apache.cassandra.concurrent.SEPWorker.run() @bci=87, line=105
> > (Compiled frame)
> >  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> >
> >
> > Dominik and I would appreciate if you could give some feedback. We think
> > it is a bug :-)
> >
> > Cheers,
> > Christian
> >
> >
> > On Thu, Mar 3, 2016 at 1:08 PM, Dominik Keil <dominik.k...@movilizer.com
> >
> > wrote:
> >
> >> Hi,
> >>
> >> seeing this here as well. Basically at some point during a repair within
> >> the cluster (incremental, parallel, partitioner range, only one node at
> a
> >> time) some node (sometimes the repairing node, sometimes another) starts
> >> piling up READs, i.e. pending reads go through the roof. Despite only
> one
> >> node affected at a time and enough other, well behaving nodes, are
> >> available to satisfy our quorum reads, this impacts the read
> performance of
> >> the whole cluster.
> >>
> >> We have multiple C* 2.2 (2.2.5) clusters, basically running the same
> >> application but with different amount of load. This symptom only
> appears on
> >> one of our clusters, which has significantly more usage than most
> others.
> >> This is also our largest cluster, having about 3 times as many machines
> as
> >> most other ones (and RF 5 instead of RF 3).
> >>
> >> We did not see this before 2.0 and also only started to see this on that
> >> particular cluster. We didn't see this on any other cluster after
> upgrading
> >> form 2.0 (but then again, they're not as loaded).
> >>
> >> At first we suspected the incremental repair, because we did have issues
> >> with that as well with heap pressure with 4GB of heap. Went to 6GB and
> no
> >> more heap pressure but the problem persists. The suspect does not
> change as
> >> this bad behaviour coincides with repair and specifically with
> >> anticompaction going on.
> >>
> >> We see a clear timely correlation between open file handles rising and
> at
> >> the same time active validations skyrocketing and the beginning of such
> an
> >> "event". Same goes for the end of that event, which is clearly timely
> >> correlated to the validations being completed and the number of file
> >> handles dropping dramatically. However, these "beginning markers" seem
> to
> >> be both present. Active validations skyrocketing without open file
> handlers
> >> skyrocketing at the same time does not produce these symptoms.
> >>
> >> The only thing we have found so far that made a difference in these
> >> situations is compaction throughput. When we decreased the compaction
> >> throughput in these events, pending reads piled up even more and even
> more
> >> quickly. Beyond that we're still pretty much in the dark. Anyway
> something
> >> is locking up Cassandra internally.
> >>
> >> We suspect that there may be a "synchronized" somewhere it shouldn't be
> >> (or should be solved differently) but that's just a guess. We'll try to
> >> produce some jstacks but the events are pretty elusive because they
> happen
> >> suddenly and don't last very long (except when we're not watching
> closely
> >> -_-).
> >>
> >> Anyway, here's some graphs to illustrate what I've tried to describe:
> >>
> >> (1) CPU Usage of Cassandra (green) and open file descriptors (yellow,
> >> second y-axis)
> >>
> >>
> >> (2) Threads: new threads per second (orange line, second y-axis)
> >>
> >>
> >> (3) See below; active and pending tasks on second y-axis
> >>
> >>
> >> (4) Compactions and Validations: Active Tasks (Bars, second y-axis) and
> >> Completed Tasks / s (lines)
> >>
> >>
> >> You can see around 08:03 an event started with a sudden rise in active
> >> validations and multiple sudden increases in open file descriptors. The
> >> event lasts until 08:46 with a sudden drop in open file descriptors and
> a
> >> huge peak in new threads per second.
> >>
> >> During the event you can see Cassandra's CPU usage drops significantly.
> >> Same goes for GC activity (graph not included here, because STW GC only
> >> happens about once every 50 minutes and then takes only a fraction of a
> >> second).
> >>
> >> As you can see there's another such event later on but much smaller and
> >> shorter and between the events the pattern with the validations
> continues
> >> the same way without problems - only difference: No significant change
> in
> >> open file descriptor count.
> >>
> >> I have system graphs as well but not included because they show no
> >> problems: CPU usage goes down during that event, no I/O wait on the CPU
> and
> >> disk OP/s as well as throughput actually go down as well.
> >>
> >> During the depicted time frame there was a repair (incremental,
> parallel,
> >> partitioner range) running on a different machine within the cluster.
> We've
> >> switched back to -pr because when running it without -pr these event
> happen
> >> more often and more articulated but I think that it's just affected by
> the
> >> same underlying problem.
> >>
> >> Interestingly we had a similar issue in another cluster last night,
> which
> >> runs C* 2.1.13 and does NOT yet use incremental repair (just full repair
> >> with -pr).
> >>
> >> Any chance something in the read path is affected by the set compaction
> >> throughput and/or running compactions? It definitely seems that
> Cassandra
> >> is severly restricting itself here.
> >>
> >> Best regards,
> >> Dominik
> >>
> >>
> >> Am 26.02.2016 um 17:42 schrieb horschi:
> >>
> >> Hi,
> >>
> >> I just had a weird behaviour on one of our Cassandra nodes, which I
> would
> >> like to share:
> >>
> >> Short version:
> >> My pending reads went up from ~0 to the hundreds when I reduced the
> >> compactionthroughput from 16 to 2.
> >>
> >>
> >> Long version:
> >>
> >> One of our more powerful nodes had a few pending reads, while the other
> >> ones didn't. So far nothing special.
> >>
> >> Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was
> >> particularly high. So I was wondering.
> >>
> >> That machine had two compactions and a validation(incremental) running,
> >> so I set the compactionthroughput to 2. To my surprise I saw the pending
> >> reads go up to the hundreds within 5-10 seconds. Setting the
> >> compactionthroughput back to 16 and the pending reads went back to 0
> (or at
> >> least close to zero).
> >>
> >> I kept the compactionthroughput on 2 for less than a minute. So the
> issue
> >> is not compactions falling behind.
> >>
> >> I was able to reproduce this behaviour 5-10 times. The pending reads
> went
> >> up, everytime I *de*creased the compactionthroughput. I watched the
> >> pending reads while the compactionthroughput was on 16, and I never
> >> observed even a two digit pending read count while it was on
> >> compactionthroughput 16.
> >>
> >> Unfortunetaly the machine does not show this behaviour any more. Also it
> >> was only a single machine.
> >>
> >>
> >>
> >> Our setup:
> >> C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap
> >>
> >>
> >> My question:
> >> Did someone else ever observe such a behaviour?
> >>
> >> Is it perhaps possible that the read-path shares a lock with
> >> repair/compaction that waits on ThrottledReader while holding that lock?
> >>
> >>
> >> kind regards,
> >> Christian
> >>
> >>
> >> --
> >> *Dominik Keil*
> >> Phone: + 49 (0) 621 150 207 31
> >> Mobile: + 49 (0) 151 626 602 14
> >>
> >> Movilizer GmbH
> >> Julius-Hatry-Strasse 1
> >> 68163 Mannheim
> >> Germany
> >>
> >> movilizer.com
> >>
> >> [image: Visit company website] <http://movilizer.com/>
> >> *Reinvent Your Mobile Enterprise*
> >>
> >> <http://movilizer.com/training>
> >> <http://movilizer.com/training>
> >>
> >> *Be the first to know:*
> >> Twitter <https://twitter.com/Movilizer> | LinkedIn
> >> <https://www.linkedin.com/company/movilizer-gmbh> | Facebook
> >> <https://www.facebook.com/Movilizer> | stack overflow
> >> <http://stackoverflow.com/questions/tagged/movilizer>
> >>
> >> Company's registered office: Mannheim HRB: 700323 / Country Court:
> >> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver
> Lesche
> >> Please inform us immediately if this e-mail and/or any attachment was
> >> transmitted incompletely or was not intelligible.
> >>
> >> This e-mail and any attachment is for authorized use by the intended
> >> recipient(s) only. It may contain proprietary material, confidential
> >> information and/or be subject to legal privilege. It should not be
> >> copied, disclosed to, retained or used by any other party. If you are
> not
> >> an intended recipient then please promptly delete this e-mail and any
> >> attachment and all copies and inform the sender.
> >
> >
> >
>

Reply via email to