Add a JMX call to force cleaning repair sessions (in case they are hang up)
---------------------------------------------------------------------------
Key: CASSANDRA-3316
URL: https://issues.apache.org/jira/browse/CASSANDRA-3316
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.8.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
Fix For: 0.8.8
A repair session contains many parts, most of which are not local to the node
(implying the node waits on those operation). You request merkle trees, then
you schedule streaming (and in 1.0.0, some of the streaming don't involve the
local node itself). It's lots of place where something can go wrong, and if so
it leaves the repair hanging and as a consequence it leaves a repairSessions
tasks sitting active on the 'AntiEntropy Session' executor.
Obviously, we should improve the detection by repair of those things that can
go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much of
the remaining parts as possible, but my bet is that it will be hard to cover
everything (and it may not be worth of handling very improbable failure
scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, so
it may take some time to be committed. In the meantime, it would be nice to
provide a JMX call to force terminating repairSessions so that you don't end up
in the case where you have enough 'zombie' sessions on the executor that you
can't submit new ones (you could restart the node but it's ugly). Anyway, it's
not a big issue but it would be simple to add such a JMX call.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira