[ https://issues.apache.org/jira/browse/CASSANDRA-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-8269. --------------------------------------- Resolution: Won't Fix If you don't have enough heap to deserialize the partition summary you need to increase column_index_size_in_kb, stop generating huge partitions, or both. (We're addressing this for hints with CASSANDRA-6230.) > Large number of system hints & other CF's cause heap to fill and run OOM > ------------------------------------------------------------------------ > > Key: CASSANDRA-8269 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8269 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: DSE 4.5.0 with Apache Cassandra 2.0.5 > Reporter: Jose Martinez Poblete > Attachments: alln01-ats-cas2-java_1414110068_Leak_Suspects.zip, > system.log > > > A 3 node cluster with large amount of sstables for system.hints and other 3 > user tables was coming down regularly with OOM on system log showing up the > following: > {noformat} > ERROR [OptionalTasks:1] 2014-10-23 18:51:29,052 CassandraDaemon.java (line > 199) Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.OutOfMemoryError: Java heap space > at > org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187) > at > org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:122) > at > org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.computeNext(SSTableScanner.java:229) > at > org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.computeNext(SSTableScanner.java:203) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at > org.apache.cassandra.io.sstable.SSTableScanner.hasNext(SSTableScanner.java:183) > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:87) > at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46) > at > org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:74) > at > org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1586) > at > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1709) > at > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1643) > at > org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:513) > at > org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:91) > at > org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:173) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > A heapdump would show the following: > {noformat} > Class Name > | Shallow Heap | Retained Heap | Percentage > ---------------------------------------------------------------------------------------------------------------------------------- > java.lang.Thread @ 0x67b292138 OptionalTasks:1 Thread > | 104 | 4,901,485,768 | 58.60% > |- org.apache.cassandra.utils.MergeIterator$ManyToOne @ 0x7b9dc4ad8 > | 40 | 4,900,817,312 | 58.59% > | |- java.util.ArrayList @ 0x6f05f15f0 > | 24 | 403,635,848 | 4.83% > | |- org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator @ > 0x7b5fe7078| 40 | 29,669,312 | 0.35% > | | |- org.apache.cassandra.db.RowIndexEntry$IndexedEntry @ 0x7b7caaa28 > | 32 | 26,770,264 | 0.32% > | | |- org.apache.cassandra.db.RowIndexEntry$IndexedEntry @ 0x7b7f6e670 > | 32 | 2,898,864 | 0.03% > | | | '- java.util.ArrayList @ 0x7b7caaae0 > | 24 | 2,898,832 | 0.03% > | | | '- java.lang.Object[12283] @ 0x7b7caaaf8 > | 49,152 | 2,898,808 | 0.03% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb6af8 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb6be0 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb6cc8 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb6db0 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb6e98 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb6f80 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7068 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7150 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7238 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7320 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7408 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb74f0 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb75d8 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb76c0 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb77a8 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7890 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7978 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7a60 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7b48 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7c30 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7d18 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7e00 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7ee8 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb7fd0 | 40 | 232 | 0.00% > | | | |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ > 0x7b7cb80b8 | 40 | 232 | 0.00% > | | | '- Total: 25 of 12,283 entries; 12,258 more > | | | > ---------------------------------------------------------------------------------------------------------------------------------- > {noformat} > We suspected of large amount of system tables to be an issue: > {noformat} > alln01-ats-cas2: > ============ > [root@alln01-ats-cas2 ~]# sstableReport | tee /tmp/sstableReport.txt > Data directory: /cassandra/data > Total sstable files: 45662 > Itemized: > ks_r_only test_results_verify FileCount: 3 > mfgprod test_results FileCount: 292 > mfgprod test_results_logs FileCount: 4 > mfgprod test_results_new FileCount: 12 > mfgprod test_results_new2 FileCount: 6 > mfgprod test_results_new3 FileCount: 6 > mfgprod test_results_new4 FileCount: 9633 > mfgprod test_results_new5 FileCount: 9667 > mfgprod test_results_new6 FileCount: 8867 > mfgprod test_results_verify_threads FileCount: 1 > mfgprod test_results_verify_threads_new5 FileCount: 1 > mfgprod test_results_verify_threads_new6 FileCount: 24 > OpsCenter bestpractice_results FileCount: 1 > OpsCenter events FileCount: 6 > OpsCenter events_timeline FileCount: 2 > OpsCenter pdps FileCount: 7 > OpsCenter rollups300 FileCount: 10 > OpsCenter rollups60 FileCount: 29 > OpsCenter rollups7200 FileCount: 1 > OpsCenter rollups86400 FileCount: 1 > OpsCenter settings FileCount: 10 > pkm_test pkm1 FileCount: 1 > stressd Standard1 FileCount: 2 > stress Standard1 FileCount: 1 > system batchlog FileCount: 165 > system compaction_history FileCount: 2 > system compactions_in_progress FileCount: 5 > system hints FileCount: 16856 > system IndexInfo FileCount: 1 > system local FileCount: 2 > system peer_events FileCount: 3 > system peers FileCount: 4 > system schema_columnfamilies FileCount: 3 > system schema_columns FileCount: 3 > system schema_keyspaces FileCount: 3 > system sstable_activity FileCount: 28 > {noformat} > System became stable after we rid of the system hints and compacted other 3 > user tables: > {noformat} > mfgprod test_results_new4 FileCount: 9633 > mfgprod test_results_new5 FileCount: 9667 > mfgprod test_results_new6 FileCount: 8867 > system hints FileCount: 16856 > {noformat} > Heapdump is rather large to be attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)