[
https://issues.apache.org/jira/browse/CASSANDRA-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stanislav Vishnevskiy updated CASSANDRA-13687:
----------------------------------------------
Summary: Abnormal heap growth and CPU usage during repair. (was: Abnormal
heap growth and long GC during repair.)
> Abnormal heap growth and CPU usage during repair.
> -------------------------------------------------
>
> Key: CASSANDRA-13687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13687
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stanislav Vishnevskiy
> Attachments: 3.0.14cpu.png, 3.0.14heap.png, 3.0.14.png,
> 3.0.9heap.png, 3.0.9.png
>
>
> We recently upgraded from 3.0.9 to 3.0.14 to get the fix from CASSANDRA-13004
> Sadly 3 out of the last 7 nights we have had to wake up due Cassandra dying
> on us. We currently don't have any data to help reproduce this, but maybe
> since there aren't many commits between the 2 versions it might be obvious.
> Basically we trigger a parallel incremental repair from a single node every
> night at 1AM. That node will sometimes start allocating a lot and keeping the
> heap maxed and triggering GC. Some of these GC can last up to 2 minutes. This
> effectively destroys the whole cluster due to timeouts to this node.
> The only solution we currently have is to drain the node and restart the
> repair, it has worked fine the second time every time.
> I attached heap charts from 3.0.9 and 3.0.14 during repair.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]