> - Different collectors: -XX:+UseParallelGC -XX:+UseParallelOldGC Unless you also removed the -XX:+UseConcMarkSweepGC I *think* it takes precedence, so that the above options would have no effect. I didn't test. In either case, did you definitely confirm CMS was no longer being used? (Should be pretty obvious if you ran with -XX:+PrintGCDetails which looks plenty different w/o CMS)
> On #cassandra there was speculation that a large (200k) row cache may be > inducing heap fragmentation. I have not ruled this out but have been > unable to do that in stand alone ConcurrentLinkedHashMap stress testing. > Since turning off the row cache would be a cure worse than the disease > I have not tried that yet with a real cluster. I didn't follow the IRC discussion, but I think the most likely way I can see the row cache causing fragmentation and growth of *non-java-heap* memory would be if it did so by way of the data structures maintained by CMS for old-gen. If you really made it run without CMS... I can't really claim a lot of certainty but I'd be pretty surprised if the row cache was responsible for out-of-heap memory leakage with the default compacting collectors. > Future possibilities would be to get the limits set right for mlockall, > trying combinations of the above, and running without caches. > > I have gc logs if anyone is interested. Yes :) > [1] http://img194.imageshack.us/img194/383/2weekmem.png I did go back and revisit the old thread... maybe I'm missing something, but just to be real sure: What does the "no color"/white mean on this graph? Is that application memory (resident set)? I'm not really sure what I'm looking for since you already said you tested with 'standard' which rules out the resident-set-memory-as-a-result-of-mmap being counted towards the leak. But still. -- / Peter Schuller