A few things to try: 1. Enable verbose GC logging to see if your JVM is dying under GC load. 2. pkill -3 java will dump some nice stack traces from all running threads, could be some clues there.
Dan Larsen wrote: > Hi again :-) > > O.k... New problem... > I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM. > Running CommitLog on 1 disk and data on another. > Cassandra 0.4.0 - (yes I have checked... correct version :-P) > 6GB set in the cassandra.in.sh. > > I started throwing data at it, without problems. > All of a sudden, the node becomes irresponsive. > > I only have 6.6GB of data in the DBs. > > I experienced the same thing, while running much smaller nodes. > > I tried restarting cassandra (kill [pid]). > > When it starts up, it goes crazy for a while, trying to fill up the > RAM or something. > Then it stops filling RAM, but keeps a load of ~100% CPU. > It doesn't respond to anything, but a nodeprobe info, which responds, > but VERY slowly. > > > The log doesn't give me anything - not that I can understand anyways... > > [.....] > INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142) > Cassandra starting up... > INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239 > ColumnFamilyStore.java (line 369) LocationInfo has reached its > threshold; switching in a fresh Memtable > INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239 > ColumnFamilyStore.java (line 1178) Enqueuing flush of > Memtable(LocationInfo)@2116316013 > INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java > (line 186) Flushing Memtable(LocationInfo)@2116316013 > DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line > 466) discard completed log segments for > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1255087417263.log', > position=257), column family 0. CFIDs are system: > TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher: > TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6, > SentencePages: 5, }), } > DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line > 509) Marking replay position 257 on commit log > /var/lib/cassandra/commitlog/CommitLog-1255087417263.log > INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java > (line 220) Completed flushing > /mnt/cassandra/data/system/LocationInfo-19-Data.db > DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 > SSTableReader.java (line 58) index size for bloom filter calc for file > : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256 > DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 > SSTableReader.java (line 58) index size for bloom filter calc for file > : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512 > DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 > SSTableReader.java (line 58) index size for bloom filter calc for file > : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768 > DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 > SSTableReader.java (line 58) index size for bloom filter calc for file > : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024 > DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 > ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024 > DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line > 40) Disseminating load info ... > DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line > 40) Disseminating load info ... > DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line > 40) Disseminating load info ... > DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line > 40) Disseminating load info ... > > > If I try to insert anything, I get stuff like this: > > ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java > (line 179) error writing key md5 > java.util.concurrent.TimeoutException: Operation timed out - received > only 0 responses from . > at > org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88) > > at > org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:164) > > at > org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468) > > at > org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:421) > > at > org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:824) > > at > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) > > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > > Any ideas? > > Best regards > Dan -- Eric Bowman Boboco Ltd [email protected] http://www.boboco.ie/ebowman/pubkey.pgp +35318394189/+353872801532
