> This sounds very much like "my heap is so consumed by (mostly) bloom
> filters that I am in steady state GC thrash."
> 
> Yes, I think that was at least part of the issue.

The rough numbers I've used to estimate working set are:

* bloom filter size for 400M rows at 0.00074 fp without java fudge (they are 
just a big array) 714 MB
* memtable size 1024 MB 
* index sampling:
        *  24 bytes + key (16 bytes for UUID) = 32 bytes 
        * 400M / 128 default sampling = 3,125,000
        *  3,125,000 * 32 = 95 MB
        * java fudge X5 or X10 = 475MB to 950MB
* ignoring row cache and key cache
 
So the high side number is 2213 to 2,688. High because the fudge is a delicious 
sticky guess and the memtable space would rarely be full. 

On a 5120 MB heap, with 800MB new you have roughly  4300 MB tenured  (some goes 
to perm) and 75% of that is 3,225 MB. Not terrible but it depends on the 
working set and how quickly stuff get's tenured which depends on the workload. 

You can confirm these guesses somewhat manually by enabling all the GC logging 
in cassandra-env.sh. Restart the node and let it operate normally, probably 
best to keep repair off.

This is a sample (partial) GC log before CMS kicks in note  concurrent 
mark-sweep used size. Your values may differ, this has non default settings:

Heap after GC invocations=9947 (full 182):
 par new generation   total 1024000K, used 101882K [0x00000006fae00000, 
0x0000000745e00000, 0x0000000745e00000)
  eden space 819200K,   0% used [0x00000006fae00000, 0x00000006fae00000, 
0x000000072ce00000)
  from space 204800K,  49% used [0x0000000739600000, 0x000000073f97eaf8, 
0x0000000745e00000)
  to   space 204800K,   0% used [0x000000072ce00000, 0x000000072ce00000, 
0x0000000739600000)
 concurrent mark-sweep generation total 2965504K, used 2309885K 
[0x0000000745e00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 38052K, used 22811K [0x00000007fae00000, 
0x00000007fd329000, 0x0000000800000000)
}

This is when it starts, see (full X) count increases :

2012-10-25T03:32:44.664-0500: 76691.891: [GC [1 CMS-initial-mark: 
2309885K(2965504K)] 2411929K(3989504K), 0.0047910 secs] [Times: user=0.01 
sys=0.00, real=0.01 secs] 
Total time for which application threads were stopped: 0.0059850 seconds

(other CMS type logs)

{Heap before GC invocations=9947 (full 183):
 par new generation   total 1024000K, used 921082K [0x00000006fae00000, 
0x0000000745e00000, 0x0000000745e00000)
  eden space 819200K, 100% used [0x00000006fae00000, 0x000000072ce00000, 
0x000000072ce00000)
  from space 204800K,  49% used [0x0000000739600000, 0x000000073f97eaf8, 
0x0000000745e00000)
  to   space 204800K,   0% used [0x000000072ce00000, 0x000000072ce00000, 
0x0000000739600000)
 concurrent mark-sweep generation total 2965504K, used 2206292K 
[0x0000000745e00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 38052K, used 22811K [0x00000007fae00000, 
0x00000007fd329000, 0x0000000800000000)

A couple of log messages later concurrent mark-sweep used size is down:

{Heap before GC invocations=9948 (full 183):
 par new generation   total 1024000K, used 938695K [0x00000006fae00000, 
0x0000000745e00000, 0x0000000745e00000)
  eden space 819200K, 100% used [0x00000006fae00000, 0x000000072ce00000, 
0x000000072ce00000)
  from space 204800K,  58% used [0x000000072ce00000, 0x00000007342b1f00, 
0x0000000739600000)
  to   space 204800K,   0% used [0x0000000739600000, 0x0000000739600000, 
0x0000000745e00000)
 concurrent mark-sweep generation total 2965504K, used 1096146K 
[0x0000000745e00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 38052K, used 22811K [0x00000007fae00000, 
0x00000007fd329000, 0x0000000800000000)
2012-10-25T03:32:50.479-0500: 76697.706: [GC Before GC:


There are a few things you could try:

* increase the JVM heap by say 1Gb and see how it goes
* increase bloom filter false positive,  try 0.1 first (see 
http://www.datastax.com/docs/1.1/configuration/storage_configuration#bloom-filter-fp-chance)
 
* increase index_interval sampling in yaml.  
* decreasing compaction_throughput and in_memory_compaction_limit can lesson 
the additional memory pressure compaction adds. 
* disable caches or ensure off heap caches are used.

Watching the gc logs and the cassandra log is a great way to get a feel for 
what works in your situation. Also take note of any scheduled processing your 
app does which may impact things, and look for poorly performing queries. 

Finally this book is a good reference on Java GC http://amzn.com/0137142528 

For my understanding what was the average row size for the 400 million keys ? 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/10/2012, at 1:22 PM, Bryan Talbot <btal...@aeriagames.com> wrote:

> On Wed, Oct 24, 2012 at 2:38 PM, Rob Coli <rc...@palominodb.com> wrote:
> On Mon, Oct 22, 2012 at 8:38 AM, Bryan Talbot <btal...@aeriagames.com> wrote:
> > The nodes with the most data used the most memory.  All nodes are affected
> > eventually not just one.  The GC was on-going even when the nodes were not
> > compacting or running a heavy application load -- even when the main app was
> > paused constant the GC continued.
> 
> This sounds very much like "my heap is so consumed by (mostly) bloom
> filters that I am in steady state GC thrash."
> 
> Yes, I think that was at least part of the issue.
> 
>  
> 
> Do you have heap graphs which show a healthy sawtooth GC cycle which
> then more or less flatlines?
> 
> 
> 
> I didn't save any graphs but that is what they would look like.  I was using 
> jstat to monitor gc activity. 
> 
> -Bryan
> 

Reply via email to