[
https://issues.apache.org/jira/browse/CASSANDRA-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne resolved CASSANDRA-5736.
-----------------------------------------
Resolution: Duplicate
Yeah, pretty sure this is the same problem than CASSANDRA-5677 so closing this
as duplicate.
> CQL3PagingRecordReader can OOM and kill nodes
> ---------------------------------------------
>
> Key: CASSANDRA-5736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5736
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 1.2.6
> Reporter: Michael Kjellman
>
> It looks like the CQL3PagingRecordReader will end up OOMing many nodes in a
> cluster as the OOM/GC Storm due to ReadStage
> This is the stack trace from all of the ReadStage threads:
> {code}
> org.apache.cassandra.db.marshal.DateType.compare(DateType.java:62)
> org.apache.cassandra.db.marshal.DateType.compare(DateType.java:32)
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:229)
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:81)
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
> java.util.TimSort.mergeHi(TimSort.java:806)
> java.util.TimSort.mergeAt(TimSort.java:485)
> java.util.TimSort.mergeForceCollapse(TimSort.java:426)
> java.util.TimSort.sort(TimSort.java:223)
> java.util.TimSort.sort(TimSort.java:173)
> java.util.Arrays.sort(Arrays.java:659)
> java.util.Collections.sort(Collections.java:217)
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:255)
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:281)
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:280)
> org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72)
> org.apache.cassandra.utils.IntervalTree.build(IntervalTree.java:81)
> org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:181)
> org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:40)
> org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:51)
> org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224)
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:182)
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154)
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:143)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:86)
> org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:45)
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:134)
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:106)
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:79)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> org.apache.cassandra.db.ColumnFamilyStore$6.computeNext(ColumnFamilyStore.java:1432)
> org.apache.cassandra.db.ColumnFamilyStore$6.computeNext(ColumnFamilyStore.java:1428)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1499)
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1476)
> org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:46)
> org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:58)
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:722)
> {code}
> As best I can tell this is related to any row with > 5ish tombstones and has
> something to do with DeletionInfo trying to sort the results. Only way to fix
> this was to rolling restart all of the nodes in the cluster as the ReadStage
> threads appeared to be making no progress (most likely due to GC..)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira