As Matija mentioned, my coworker Alexander worked on Reaper. I believe the branches of most interest would be:
Incremental repairs on Reaper: https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-that-works UI integration with incremental repairs on Reaper: https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui @George When I check the log for pattern "session completed successfully" in > system.log, I see the last finished range occurred in 14 hours ago. So I > think it is safe to say that the repair has hanged somehow. > What is your current setting for 'streaming_socket_timeout_in_ms'. You might want to be aware of https://issues.apache.org/jira/browse/CASSANDRA-8611 and https://issues.apache.org/jira/browse/CASSANDRA-11840 Depending on how long the streams are expected to be, you might want to try '3600000 ms (1 hour)', if you are currently using 0, or increasing this value it is already set if you think you might be hitting https://issues.apache.org/jira/browse/CASSANDRA-11840 In order to start another repair, do we need to 'kill' this repair. If so, > how do we do that? Restarting the node is a straightforward way of doing that. If you do not want to restart for some reason, you can use JMX ( forceTerminateAllRepairSessions). If you are going to use JMX and don't know much about it, this video of the presentation done by Nate, , another coworker, at the Cassandra Summit 2016 might be of interest https://www.youtube.com/watch?v=uiUThbonnpc&index=21&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk . C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-09-22 16:45 GMT+02:00 Li, Guangxing <guangxing...@pearson.com>: > Romain, > > I had another repair that seems to just hang last night. When I did 'nodetool > tpstats' on nodes, I see the following in the node where I initiated the > repair: > AntiEntropySessions 1 1 > On all other nodes, I see: > AntiEntropySessions 0 0 > When I check the log for pattern "session completed successfully" in > system.log, I see the last finished range occurred in 14 hours ago. So I > think it is safe to say that the repair has hanged somehow. In order to > start another repair, do we need to 'kill' this repair. If so, how do we do > that? > > Thanks. > > George. > > On Thu, Sep 22, 2016 at 6:23 AM, Romain Hardouin <romainh...@yahoo.fr> > wrote: > >> I meant that pending (and active) AntiEntropySessions are a simple way to >> check if a repair is still running on a cluster. Also have a look at >> Cassandra reaper: >> - https://github.com/spotify/cassandra-reaper >> >> - https://github.com/spodkowinski/cassandra-reaper-ui >> >> Best, >> Romain >> >> >> >> Le Mercredi 21 septembre 2016 22h32, "Li, Guangxing" < >> guangxing...@pearson.com> a écrit : >> >> Romain, >> >> I started running a new repair. If I see such behavior again, I will try >> what you mentioned. >> >> Thanks. >> > >