[
https://issues.apache.org/jira/browse/CASSANDRA-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brooke Bryan updated CASSANDRA-5367:
------------------------------------
Issue Type: Bug (was: Improvement)
> Hints stuck on compaction
> -------------------------
>
> Key: CASSANDRA-5367
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5367
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.2.2
> Environment: 80 Node cluster on 1.2.2 (problem has been around since
> before 1.0)
> Reporter: Brooke Bryan
>
> When our cluster is handling hints, we will very often see hints get stuck on
> nodes if it is unable to communicate with another node. The problem is not
> that the other node is down, the other node will be sat doing compactions, or
> running out of memory. While that node is a problem, and needs to be fixed,
> all other nodes on the cluster will stick waiting to handle hints between
> that node and itself.
> This causes a pretty major knock on effect throughout the entire cluster,
> causing hints to back up. We are seeing some nodes backed up with 14GB of
> hints, after 2 days of the hints being stuck.
> Also, during this "stuck" session, compactionstats will show a compaction on
> the system hints column family, and not change the completed bytes amount.
> This is the only reason for an entire cluster to get very bogged down from
> what I have experienced, and requires a lot of manual intervention to get
> everything back online.
> After putting a node into debug mode, I have narrowed down the issue to be
> within:
> startColumn = hint.name(); (line ~361 HintedHandoffManager) and line 390
> based on the log output, and through pausing handoffs etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira