I turns out there were several bugs that make 0.3 run out of memory during sustained insert. These are fixed in trunk, which is almost stable (#233 is the last disk format change, and will be committed as soon as review is done).
-Jonathan On Mon, Aug 10, 2009 at 7:20 PM, Huming Wu<huming...@gmail.com> wrote: > I am currently doing some test on cassandra (0.3.0-final). two nodes > with each node > has 8G ram and 8 core cpus. And here are some setting from my > storage-conf.xml: > > <ReplicationFactor>2</ReplicationFactor> > <ColumnIndexSizeInKB>256</ColumnIndexSizeInKB> > <MemtableSizeInMB>1024</MemtableSizeInMB> > <MemtableObjectCountInMillions>2</MemtableObjectCountInMillions> > > The test data I have has about 880K unique keys and my test program > simply inserts the same 5 columns into Table1.Standard1 using > thrift.batch_insert. For each key, the record size ranges from 21 > bytes to 5K with the average being 40 bytes. The program calls > batch_insert repeatedly - 4 million times with 50 concurrent thrift > connections (about 220MB data excluding keys is sent to cassandra). > What I see was basically the JAVA resident memory grows to the GC > limit (6G) and everything just halts after that. If I restarted > cassandra I can see the footprint is around 1.9G and I can do insert > again but the memory keeps growing and so on. Here is my JVM setting: > > -Xmx6000m -Xms6000m -XX:+HeapDumpOnOutOfMemoryError -XX:NewSize=1000m > -XX:MaxNewSize=1000m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintHeapAtGC > -XX:+PrintGCDetails -Xloggc:gc.log > > Here is the jmap output (top 10 objects): > > num #instances #bytes class name > -------------------------------------- > > 1: 1436005 658220304 [Ljava.lang.Object; > 2: 12100491 484019640 java.lang.String > 3: 9904511 437577600 [C > 4: 2709812 322398784 [I > 5: 5607988 224319520 java.util.concurrent.ConcurrentSkipListMap$Node > 6: 4469810 214550880 org.apache.cassandra.db.Column > 7: 3339219 213710016 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT > 8: 3339230 191142200 [J > 9: 4506140 179147648 [B > 10: 2220024 106561152 > java.util.concurrent.ConcurrentSkipListMap$HeadIndex > > Does anyone have any idea why cassandra uses so much memory? From > gc.log I do see gc has kicked in many times (but not major compaction > though). I'd expect that with this small data set everything would > just work fine with the avail. memory (I mean the test should just go > on for weeks). > > Any suggestion? > > Thanks, > Huming >