That is exactly, what I have discovered by now ;-)
Looking forward for the next release then!

Thanks Jonathan!

Best
Dan


On 09/10/2009, at 16.21, Jonathan Ellis wrote:

if you swamp it with inserts faster than it can write them, it will
start spending more and more time trying to GC.  that's what's
happening...  trunk is smarter about this and will stop accepting
writes before it gets to that point, but for 0.4 you just need to be a
little careful.

-Jonathan

On Fri, Oct 9, 2009 at 7:11 AM, Dan Larsen <[email protected]> wrote:
Thanks for the tips Eric.

I was just about to try it, when I noticed, it had become responsive again.
It took exactly 1 hour, before it was done!...

But when I restart now, it's ready almost immediatly... Weird stuff!!

I will try out your tips, next time this happens!

It sounds like, it's pretty well-defined, when the JVM dies under GC load..?
Any pointers there?
I was just thinking, that it might be possible, to add nodes based on
current knowledge?

#Dan

On 09/10/2009, at 13.58, Eric Bowman wrote:

A few things to try:

1. Enable verbose GC logging to see if your JVM is dying under GC load.
2. pkill -3 java will dump some nice stack traces from all running
threads, could be some clues there.


Dan Larsen wrote:

Hi again :-)

O.k... New problem...
I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
Running CommitLog on 1 disk and data on another.
Cassandra 0.4.0 - (yes I have checked... correct version :-P)
6GB set in the cassandra.in.sh.

I started throwing data at it, without problems.
All of a sudden, the node becomes irresponsive.

I only have 6.6GB of data in the DBs.

I experienced the same thing, while running much smaller nodes.

I tried restarting cassandra (kill [pid]).

When it starts up, it goes crazy for a while, trying to fill up the
RAM or something.
Then it stops filling RAM, but keeps a load of ~100% CPU.
It doesn't respond to anything, but a nodeprobe info, which responds,
but VERY slowly.


The log doesn't give me anything - not that I can understand anyways...

[.....]
INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
Cassandra starting up...
INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
ColumnFamilyStore.java (line 369) LocationInfo has reached its
threshold; switching in a fresh Memtable
INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
ColumnFamilyStore.java (line 1178) Enqueuing flush of
Memtable(LocationInfo)@2116316013
INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
(line 186) Flushing Memtable(LocationInfo)@2116316013
DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line
466) discard completed log segments for

CommitLogContext(file='/var/lib/cassandra/commitlog/ CommitLog-1255087417263.log',
position=257), column family 0. CFIDs are system:
TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
SentencePages: 5, }), }
DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line
509) Marking replay position 257 on commit log
/var/lib/cassandra/commitlog/CommitLog-1255087417263.log
INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
(line 220) Completed flushing
/mnt/cassandra/data/system/LocationInfo-19-Data.db
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for file
: /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for file
: /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for file
: /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for file
: /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
40) Disseminating load info ...
DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
40) Disseminating load info ...
DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
40) Disseminating load info ...
DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
40) Disseminating load info ...


If I try to insert anything, I get stuff like this:

ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
(line 179) error writing key md5
java.util.concurrent.TimeoutException: Operation timed out - received
only 0 responses from .
at

org.apache.cassandra.service.QuorumResponseHandler.get (QuorumResponseHandler.java:88)

at

org.apache.cassandra.service.StorageProxy.insertBlocking (StorageProxy.java:164)

at

org.apache.cassandra.service.CassandraServer.doInsert (CassandraServer.java:468)

at

org.apache.cassandra.service.CassandraServer.insert (CassandraServer.java:421)

at

org.apache.cassandra.service.Cassandra$Processor$insert.process (Cassandra.java:824)

at

org.apache.cassandra.service.Cassandra$Processor.process (Cassandra.java:627)

at

org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run (TThreadPoolServer.java:253)

at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)


Any ideas?

Best regards
Dan


--
Eric Bowman
Boboco Ltd
[email protected]
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532






Reply via email to