nit pick: nodetool repair is just called repair (or the Anti Entropy Service). Read Repair is something that happens during a read request.
Short answer, yes it's safe to kill cassandra during a repair. It's one of the nice things about never mutating data. Longer answer: If nodetool compactionstats says there are no Validation compactions running (and the compaction queue is empty) and netstats says there is nothing streaming there is a a good chance the repair is finished or dead. If a neighbour dies during a repair the node it was started on will wait for 48 hours(?) until it times out. Check the logs on the machines for errors, particularly from the AntiEntropyService. And see what compactionstats is saying on all the nodes involved in the repair. Even Longer: um, 3 TB of data is *way* to much data per node, generally happy people have up to about 200 to 300GB per node. The reason for this recommendation is so that things like repair, compaction, node moves, etc are managable and because the loss of a single node has less of an impact. I would not recommend running a live system with that much data per node. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 03:51, Adi wrote: > We have a 4 node 0.7.6 cluster. RF=2 , 3 TB data per node. > A read repair was kicked off on node 4 last week and is still in progress. > Later I kicked of read repair on node 2 a few days back. > We were writing(read/write/updates/NO deletes) data while the repair was in > progress but no data has been written for the past 3-4 days. > I was hoping the repair should get done in that time-frame before proceeding > with further writes/deletes. > > Would it be safe to stop it and kick it off per column family or do a full > scan of all keys as suggested in an earlier discussion? Any other suggestion > on hastening this repair. > > On both nodes the repair Thread is waiting at this stage for a long time(~60+ > hours) > java.lang.Thread.State: WAITING > at java.lang.Object.wait(Native Method) > - waiting on <580857f3> (a org.apache.cassandra.utils.SimpleCondition) > at java.lang.Object.wait(Object.java:485) > at > org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:38) > at > org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:791) > Locked ownable synchronizers: > - None > A CPU sampling for few minutes shows these methods as hot spots(mostly the > top two) > org.apache.cassandra.db.ColumnFamilyStore.isKeyInRemainingSSTables( ) > org.apache.cassandra.utils.BloomFilter.getHashBuckets( ) > org.apache.cassandra.io.sstable.SSTableIdentityIterator.echoData() > > netstats does not show anything streaming to/from any of the nodes. > > -Adi Pandit >