[
https://issues.apache.org/jira/browse/CASSANDRA-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-1776.
---------------------------------------
Resolution: Not A Problem
the errors in the attached file are not fatal. looks like things are working
as designed, modulo the NPE in the first place.
a real OOM should take down the server, this works very well in my experience.
if you have a log file post 0.6.3 where it does not then we can fix that.
yes, a "GC storm" will give you a scenario where the local Cassandra believes
it is fine but really it is not. Dynamic snitch is designed to mitigate this
but really it needs to be solved through monitoring and tuning.
> Untrapped exceptions in ThreadPool have a variety of ill effects
> ----------------------------------------------------------------
>
> Key: CASSANDRA-1776
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1776
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.6.5
> Reporter: Edward Capriolo
> Attachments: logs
>
>
> I have seen a variety of conditions that keep the Cassandra process running
> even though it mostly failed. At times the node stays up sending gossip
> messages so other nodes think the node is up. In the worst case condition a
> node gets in a tight loop fully utilizing 16 cores of a system and sending
> gossip messages that cause cascading issues across the cluster.
> I have seen untrapped OOM errors. The interesting part of the attached log
> is that we are not using super columns. I also have machines that come up out
> of a 40 second garbage collect, (I assume they gossip themselves as UP)
> messages then go back into a garbage collect to repeat again.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.