[
https://issues.apache.org/jira/browse/CASSANDRA-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175302#comment-13175302
]
Marcus Eriksson commented on CASSANDRA-3624:
--------------------------------------------
not yet, i might be able to do it today though
this happened in a production cluster so i will have to try to reproduce
somewhere else
> Hinted Handoff - related OOM
> ----------------------------
>
> Key: CASSANDRA-3624
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3624
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Marcus Eriksson
> Assignee: Jonathan Ellis
> Labels: hintedhandoff
> Fix For: 1.0.7
>
> Attachments: 3624.txt
>
>
> One of our nodes had collected alot of hints for another node, so when the
> dead node came back and the row mutations were read back from disk, the node
> died with an OOM-exception (and kept dying after restart, even with increased
> heap (from 8G to 12G)). The heap dump contained alot of SuperColumns and our
> application does not use those (but HH does).
> I'm guessing that each mutation is big so that PAGE_SIZE*<mutation_size> does
> not fit in memory (will check this tomorrow)
> A simple fix (if my assumption above is correct) would be to reduce the
> PAGE_SIZE in HintedHandOffManager.java to something like 10 (or even 1?) to
> reduce the memory pressure. The performance hit would be small since we are
> doing the hinted handoff throttle delay sleep before sending every *mutation*
> anyway (not every page), thoughts?
> If anyone runs in to the same problem, I got the node started again by simply
> removing the HintsColumnFamily* files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira