[ 
https://issues.apache.org/jira/browse/CASSANDRA-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-9120:
---------------------------------------
    Description: 
Found during tests on a 100 nodes cluster. After restart I found that one node 
constantly crashes with OutOfMemory Exception. I guess that auto-saved cache 
was corrupted and Cassandra can't recognize it. I see that similar issues was 
already fixed (when negative size of some structure was read). Does auto-saved 
cache have checksum? it'd help to reject corrupted cache at the very beginning.

As far as I can see current code still have that problem. Stack trace is:
{code}
INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading 
saved cache 
/storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception 
encountered during startup
java.lang.OutOfMemoryError: Java heap space
        at java.util.ArrayList.<init>(Unknown Source)
        at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
        at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
        at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
        at 
org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
        at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
        at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
        at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:272)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
        at 
org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
{code}
I looked at source code of Cassandra and see:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java

119 int entries = in.readInt();
120 List<IndexHelper.IndexInfo> columnsIndex = new 
ArrayList<IndexHelper.IndexInfo>(entries);

It seems that value entries is invalid (negative) and it tries too allocate an 
array with huge initial capacity and hits OOM. I have deleted saved_cache 
directory and was able to start node correctly. We should expect that it may 
happen in real world. Cassandra should be able to skip incorrect cached data 
and run.

  was:
Found during tests on a 100 nodes cluster. After restart I found that one node 
constantly crashes with OutOfMemory Exception. I guess that auto-saved cache 
was corrupted and Cassandra can't recognize it. I see that similar issues was 
already fixed (when negative size of some structure was read). Does auto-saved 
cache have checksum? it'd help to reject corrupted cache at the very beginning.

As far as I can see current code still have that problem. Stack trace is:

INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading 
saved cache 
/storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception 
encountered during startup
java.lang.OutOfMemoryError: Java heap space
        at java.util.ArrayList.<init>(Unknown Source)
        at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
        at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
        at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
        at 
org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
        at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
        at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
        at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:272)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
        at 
org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)

I looked at source code of Cassandra and see:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java

119 int entries = in.readInt();
120 List<IndexHelper.IndexInfo> columnsIndex = new 
ArrayList<IndexHelper.IndexInfo>(entries);

It seems that value entries is invalid (negative) and it tries too allocate an 
array with huge initial capacity and hits OOM. I have deleted saved_cache 
directory and was able to start node correctly. We should expect that it may 
happen in real world. Cassandra should be able to skip incorrect cached data 
and run.


> OutOfMemoryError when read auto-saved cache (probably broken)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-9120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9120
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Linux
>            Reporter: Vladimir
>             Fix For: 2.0.14
>
>
> Found during tests on a 100 nodes cluster. After restart I found that one 
> node constantly crashes with OutOfMemory Exception. I guess that auto-saved 
> cache was corrupted and Cassandra can't recognize it. I see that similar 
> issues was already fixed (when negative size of some structure was read). 
> Does auto-saved cache have checksum? it'd help to reject corrupted cache at 
> the very beginning.
> As far as I can see current code still have that problem. Stack trace is:
> {code}
> INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading 
> saved cache 
> /storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
> ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.ArrayList.<init>(Unknown Source)
>         at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
>         at 
> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
>         at 
> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
>         at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
>         at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:272)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
>         at 
> org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
> {code}
> I looked at source code of Cassandra and see:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java
> 119 int entries = in.readInt();
> 120 List<IndexHelper.IndexInfo> columnsIndex = new 
> ArrayList<IndexHelper.IndexInfo>(entries);
> It seems that value entries is invalid (negative) and it tries too allocate 
> an array with huge initial capacity and hits OOM. I have deleted saved_cache 
> directory and was able to start node correctly. We should expect that it may 
> happen in real world. Cassandra should be able to skip incorrect cached data 
> and run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to