[
https://issues.apache.org/jira/browse/CASSANDRA-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083511#comment-16083511
]
Chris Lohfink commented on CASSANDRA-13687:
-------------------------------------------
Can you include {{nodetool cfstats}} and {{nodetool netstats}} of a node
exhibiting this? Large partitions (maximum compressed partition size in
cfstats) and excessive streaming is very expensive with this version, and if
you have these (environmental/schema related) you can resolve it with a larger
heap or addressing your data model.
> Abnormal heap growth and long GC during repair.
> -----------------------------------------------
>
> Key: CASSANDRA-13687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13687
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stanislav Vishnevskiy
> Attachments: 3.0.14.png, 3.0.9.png
>
>
> We recently upgraded from 3.0.9 to 3.0.14 to get the fix from CASSANDRA-13004
> Sadly 3 out of the last 7 nights we have had to wake up due Cassandra dying
> on us. We currently don't have any data to help reproduce this, but maybe
> since there aren't many commits between the 2 versions it might be obvious.
> Basically we trigger a parallel incremental repair from a single node every
> night at 1AM. That node will sometimes start allocating a lot and keeping the
> heap maxed and triggering GC. Some of these GC can last up to 2 minutes. This
> effectively destroys the whole cluster due to timeouts to this node.
> The only solution we currently have is to drain the node and restart the
> repair, it has worked fine the second time every time.
> I attached heap charts from 3.0.9 and 3.0.14 during repair.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]