Hi,

  I've been having nodes failing recently with OOM exceptions (not sure
why, but we have had an increase in traffic so that could be a cause).
Most nodes have restarted fine, one node however, has been having problems
restarting.  It was failing with

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at java.io.DataInputStream.readUTF(DataInputStream.java:644)
        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:104)
        at 
org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:308)
        at 
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
        at 
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:271)
        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:338)
        at 
org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:65)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:90)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:166)

And

java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding.encode(StringCoding.java:266)
        at java.lang.StringCoding.encode(StringCoding.java:284)
        at java.lang.String.getBytes(String.java:987)
        at org.apache.cassandra.utils.FBUtilities.hash(FBUtilities.java:178)
        at 
org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:116)
        at 
org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:44)
        at org.apache.cassandra.db.Memtable.resolve(Memtable.java:148)
        at org.apache.cassandra.db.Memtable.put(Memtable.java:143)
        at 
org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:478)
        at org.apache.cassandra.db.Table.apply(Table.java:445)
        at org.apache.cassandra.db.CommitLog$3.run(CommitLog.java:365)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

I upped the Xmx value from 4G to 6G and it seems to be doing okay, however
it seems odd that it can run mostly fine with 4G, but fail to restart with
that much memory.  Maybe this ticket's issue is back?

https://issues.apache.org/jira/browse/CASSANDRA-609

Anyway, I'm hoping thing will recover with 6G then I can restart again with 4G 
and things will be good.

I'd also like a better understanding of why cassandra might OOM in general.
Are there settings which minimize the chances of OOM?  This instance has
2 column families and I have 

 <MemtableSizeInMB>512</MemtableSizeInMB>
 <MemtableObjectCountInMillions>1.0</MemtableObjectCountInMillions>
 <MemtableFlushAfterMinutes>1440</MemtableFlushAfterMinutes>

So if I understand these settings, memtables can at most be 512MB in size
or consist of 1 million objects before they are flushed to disk.  The maximum
time before they will be flushed is 24 hours.  So does that mean if I fill
up 8G or 16 memtables in less than 24 hours, I've basically used all the
memory available to me?  I assume there are other things using memory,
(indexes, etc), how is that limited?  Anyway, any information about what
is used where would be appreciated.

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <antho...@alumni.caltech.edu>

Reply via email to