RE: Anybody experience one Cassandra server locking up?

Brian Frank Cooper Wed, 19 Aug 2009 17:20:13 -0700

We are trying to learn what we can about the performance of Cassandra. I hope 
to have some results to share publicly in the next couple of weeks.

The 0.4 version seems to have handled the insert load better, but is having 
trouble with a 50/50 read/write workload. One server again has a busy core with 
the other 7 cores (and the other servers) idle or near idle. Any ideas? The 
problem seems to come when we dial up the request rate made by the client; 
after a certain point, the achievable throughput slows way down, even lower 
than what we could have achieved with a lower request rate. (Incidentally, we 
are reading and writing 10 KB records; does the large data size have any 
impact?) And using top -H, it looks like it is one of the Java threads that is 
consistently busy. Maybe it is GC again.

I was hoping to chat with some of you Cassandra folks when we visited FB last 
week...perhaps we can grab coffee sometime and chat about these issues...

Thanks!

brian
________________________________________
From: Sandeep Tata [[email protected]]
Sent: Wednesday, August 19, 2009 1:29 PM
To: [email protected]
Subject: Re: Anybody experience one Cassandra server locking up?

Brian,

Are you guys planning to run workloads at Yahoo to compare Cassandra and PNUTS?
We'd be curious to see what you learn with the 0.4/trunk code.

Sandeep

On Wed, Aug 19, 2009 at 10:20 AM, Brian Frank
Cooper<[email protected]> wrote:
> Probably you are right; after Jun's response I looked in the log and saw an 
> out of memory exception. I'll try the 0.4 beta...
>
> Thanks!
>
> brian
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:[email protected]]
> Sent: Wednesday, August 19, 2009 9:12 AM
> To: [email protected]
> Subject: Re: Anybody experience one Cassandra server locking up?
>
> sounds like you are exhausting the memory on that instance and it is
> going into "GC swap" trying to free enough to continue.  this is very
> easy to do on 0.3 -- try upgrading to the 0.4 beta if you are using
> 0.3.
>
> On Tue, Aug 18, 2009 at 3:36 PM, Brian Frank
> Cooper<[email protected]> wrote:
>> Hi folks,
>>
>>
>>
>> I have been loading a 6-server Cassandra cluster with 1KB records. After a
>> few million inserts, the insert rate drops dramatically. After
>> investigation, one of the Cassandra servers seems to be in a bad state,
>> using 100% of one core on an 8-core machine, and 0% on the other cores.
>> Inserts to this box have completely stopped, and the inserts to the other
>> boxes have slowed way down (more than a factor of 10 slower.) A "kill" or
>> "kill -3" to the bad java process does nothing; I have to use "kill -9" to
>> stop it. Has anybody experienced anything like this?
>>
>>
>>
>> Additional info:
>>
>>
>>
>> The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here
>> are the JVM options:
>>
>>
>>
>> # Arguments to pass to the JVM
>>
>> JVM_OPTS=" \
>>
>>         -ea \
>>
>>         -Xdebug \
>>
>>         -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \
>>
>>         -Xms128M \
>>
>>         -Xmx6G \
>>
>>         -XX:SurvivorRatio=8 \
>>
>>         -XX:TargetSurvivorRatio=90 \
>>
>>         -XX:+AggressiveOpts \
>>
>>         -XX:+UseParNewGC \
>>
>>         -XX:+UseConcMarkSweepGC \
>>
>>         -XX:CMSInitiatingOccupancyFraction=1 \
>>
>>         -XX:+CMSParallelRemarkEnabled \
>>
>>         -XX:+HeapDumpOnOutOfMemoryError \
>>
>>         -Dcom.sun.management.jmxremote.port=8080 \
>>
>>         -Dcom.sun.management.jmxremote.ssl=false \
>>
>>         -Dcom.sun.management.jmxremote.authenticate=false"
>>
>>
>>
>> (standard options from the Cassandra distribution, except for the 6GB of
>> heap space.)
>>
>>
>>
>> Replication factor is 1 (this is just a test, not a production setup) and
>> memtable size is set to 1GB.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> brian
>

RE: Anybody experience one Cassandra server locking up?

Reply via email to