[
https://issues.apache.org/jira/browse/CASSANDRA-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219041#comment-15219041
]
Nick Bailey commented on CASSANDRA-11461:
-----------------------------------------
Yeah. So OpsCenter lets you configure some tables for incremental repair and
some for normal subrange repair, which is what was happening in this case. So
OpsCenter is doing:
* Break up the ring into small chunks for subrange repair
* Visit a node and repair a small range for all tables that are using subrange
repair
* If any tables are configured for incremental repair, run an incremental
repair on those tables
** By default this would do a full incremental repair on those tables, which is
what was in use when this bug was hit
* Jump across the ring to a different node and repeat the above process.
It does all this in a single datacenter, since opscenter does cross dc repair.
That's at least the very high level overview.
> Failed incremental repairs never cleared from pending list
> ----------------------------------------------------------
>
> Key: CASSANDRA-11461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11461
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Adam Hattrell
>
> Set up a test cluster with 2 DC's, heavy use of LCS (not sure if that's
> relevant).
> Kick off cassandra-stress against it.
> Kick of an automated incremental repair cycle.
> After a bit a node starts flapping which causes a few repairs to fail. This
> is never cleared out of pending repairs - given the keyspace is replicated to
> all nodes it means they all have pending repairs that will never complete.
> Repairs are basically blocked at this point.
> Given we're using Incremental repairs you're now spammed with:
> "Cannot start multiple repair sessions over the same sstables"
> Cluster and logs are still available for review - message me for details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)