[
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834
]
Brandon Williams commented on CASSANDRA-2868:
---------------------------------------------
bq. Wouldn't it be worth indicating that how many collection have been done
since last log message if it's > 1, since it can (be > 1).
The only reason I added count tracking was to prevent it from firing when there
were no GCs (the api is flakey.) I've never actually been able to get > 1 to
happen, but we can add it to the logging.
bq. IMO the duration-based thresholds are hard to reason about here, where
we're dealing w/ summaries and not individual GC results.
We are dealing with individual GCs at least 99% of the time in practice. The
worst case is >1 GC inflates the gctime enough that we errantly log when it's
not needed, but I imagine to trigger that you would have to be in a gc pressure
situation already.
bq. I think I'd rather have something like the dropped messages logger, where
every N seconds we log the summary we get from the mbean.
That seems like it could a lot of noise since GC is constantly happening.
bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be
removed.
I think the logic there is still sound ("Did we just do a CMS? Is the heap
still 80% full?") and it seems to work as well as it always has.
> Native Memory Leak
> ------------------
>
> Key: CASSANDRA-2868
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Daniel Doubleday
> Assignee: Brandon Williams
> Priority: Minor
> Fix For: 0.8.4
>
> Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png,
> low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by
> several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until
> it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is
> related to mmap and report back.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira