[
https://issues.apache.org/jira/browse/CASSANDRA-13538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Shuler updated CASSANDRA-13538:
---------------------------------------
Fix Version/s: (was: 2.1.18)
2.1.x
> Cassandra tasks permanently block after the following assertion occurs during
> compaction: "java.lang.AssertionError: Interval min > max "
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-13538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13538
> Project: Cassandra
> Issue Type: Bug
> Components: Compaction
> Environment: This happens on a 7 node system with 2 data centers.
> We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still
> occurs.
> Reporter: Andy Klages
> Fix For: 2.1.x
>
> Attachments: cassandra.yaml, jstack.out, schema.cql3, system.log,
> tpstats.out
>
>
> We noticed this problem because the commitlogs proliferate to the point that
> we eventually run out of disk space. nodetool tpstats shows several of the
> tasks backed up:
> {code}
> Pool Name Active Pending Completed Blocked All
> time blocked
> MutationStage 0 0 134335315 0
> 0
> ReadStage 0 0 643986790 0
> 0
> RequestResponseStage 0 0 114298 0
> 0
> ReadRepairStage 0 0 36 0
> 0
> CounterMutationStage 0 0 0 0
> 0
> MiscStage 0 0 0 0
> 0
> AntiEntropySessions 1 1 79357 0
> 0
> HintedHandoff 0 0 90 0
> 0
> GossipStage 0 0 6595098 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> InternalResponseStage 0 0 1638369 0
> 0
> CommitLogArchiver 0 0 0 0
> 0
> CompactionExecutor 2 175 2922542 0
> 0
> ValidationExecutor 0 0 1465374 0
> 0
> MigrationStage 1 76 600 0
> 0
> AntiEntropyStage 1 923 8291098 0
> 0
> PendingRangeCalculator 0 0 20 0
> 0
> Sampler 0 0 0 0
> 0
> MemtableFlushWriter 0 0 53017 0
> 0
> MemtablePostFlush 1 4584 1545141 0
> 0
> MemtableReclaimMemory 0 0 70639 0
> 0
> Native-Transport-Requests 0 0 352559 0
> 0
> {code}
> This all starts after the following exception is raised in Cassandra:
> {code}
> ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380
> CassandraDaemon.java:231 - Exception in thread
> Thread[MemtableFlushWriter:2437,5,main]
> java.lang.AssertionError: Interval min > max
> at
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:249)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:603)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:597)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1521)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> ~[guava-16.0.jar:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {code}
> This has only occurred on one of our system tester's setup but with
> regularity. I couldn't begin to tell you how to reproduce it. We have many
> systems deployed only one this one setup encounters this issue. I have
> included the jstack output, config file, log file, and schema. I even have a
> heap dump available if needed. After looking at the heap dump, the best I can
> tell is that the assertion failure left a lock (i.e. latch) in a locked state
> that then causes a backlog of pending tasks.
> I'm hoping this assertion will mean something to the Cassandra development
> community and perhaps fixed in a newer release.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]