[ 
https://issues.apache.org/jira/browse/CASSANDRA-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-13687:
----------------------------------------
    Component/s: Streaming and Messaging

> Abnormal heap growth and CPU usage during repair.
> -------------------------------------------------
>
>                 Key: CASSANDRA-13687
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13687
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Stanislav Vishnevskiy
>         Attachments: 3.0.14cpu.png, 3.0.14heap.png, 3.0.14.png, 
> 3.0.9heap.png, 3.0.9.png
>
>
> We recently upgraded from 3.0.9 to 3.0.14 to get the fix from CASSANDRA-13004
> Sadly 3 out of the last 7 nights we have had to wake up due Cassandra dying 
> on us. We currently don't have any data to help reproduce this, but maybe 
> since there aren't many commits between the 2 versions it might be obvious.
> Basically we trigger a parallel incremental repair from a single node every 
> night at 1AM. That node will sometimes start allocating a lot and keeping the 
> heap maxed and triggering GC. Some of these GC can last up to 2 minutes. This 
> effectively destroys the whole cluster due to timeouts to this node.
> The only solution we currently have is to drain the node and restart the 
> repair, it has worked fine the second time every time.
> I attached heap charts from 3.0.9 and 3.0.14 during repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to