[
https://issues.apache.org/jira/browse/CASSANDRA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987835#action_12987835
]
Mike Malone commented on CASSANDRA-2058:
----------------------------------------
Jake/Jonathan,
FWIW, I re-implemented ExpiringMap with MapMaker using an eviction listener
(but mostly maintaining the ExpiringMap API) a little while back while
investigating some messaging service issues we were seeing. The patch is
against 0.6.8, but here's the code if you wanna try it out:
https://gist.github.com/a2f645c69ca8f44ccff3
It could definitely be simplified more by someone willing to make more
widespread code changes. Actually, I think using MapMaker directly and getting
rid of ExpiringMap would probably be best. *shrug*
> Nodes periodically spike in load
> --------------------------------
>
> Key: CASSANDRA-2058
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2058
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.10, 0.7.1
> Reporter: David King
> Assignee: Jonathan Ellis
> Fix For: 0.6.11, 0.7.1
>
> Attachments: 2058-0.7-v2.txt, 2058-0.7-v3.txt, 2058-0.7.txt,
> 2058.txt, cassandra.pmc01.log.bz2, cassandra.pmc14.log.bz2, graph a.png,
> graph b.png
>
>
> (Filing as a placeholder bug as I gather information.)
> At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8->0.6.10, turned on
> the DES, and moved some CFs from one KS into another (drain whole cluster,
> take it down, move files, change schema, put it back up). Since then, I've
> had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu
> machine) and become totally unresponsive. After a moment or two like that,
> its neighbour dies too, and the failure cascades around the ring.
> Unfortunately because of the high load I'm not able to get into the machine
> to pull a thread dump to see wtf it's doing as it happens.
> I've also had an issue where a single node spikes up to high load, but
> recovers. This may or may not be the same issue from which the nodes don't
> recover as above, but both are new behaviour
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.