[
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100945#comment-14100945
]
Joshua McKenzie commented on CASSANDRA-7560:
--------------------------------------------
+1 on the 3569 backport.
> 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-7560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Vladimir Avram
> Assignee: Yuki Morishita
> Fix For: 2.0.10
>
> Attachments: 0001-backport-CASSANDRA-6747.patch,
> 0001-partial-backport-3569.patch, cassandra_daemon.log,
> cassandra_daemon_rep1.log, cassandra_daemon_rep2.log, nodetool_command.log
>
>
> Running {{nodetool repair -pr}} will sometimes hang on one of the resulting
> AntiEntropySessions.
> The system logs will show the repair command starting
> {noformat}
> INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569)
> Starting repair command #1, repairing 256 ranges for keyspace x
> {noformat}
> You can then see a few AntiEntropySessions completing with:
> {noformat}
> INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line
> 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed
> successfully
> {noformat}
> Finally we reach an AntiEntropySession at some point that hangs just before
> requesting the merkle trees for the next column family in line for repair. So
> we first see the previous CF being finished and the whole repair sessions
> hangs here with no visible progress or errors on this or any of the related
> nodes.
> {noformat}
> INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line
> 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully
> synced
> {noformat}
> Notes:
> * Single DC 6 node cluster with an average load of 86 GB per node.
> * This appears to be random; it does not always happen on the same CF or on
> the same session.
--
This message was sent by Atlassian JIRA
(v6.2#6252)