Thanks Alexander, Now I started to run the repair with -pr arg and with keyspace and table args. Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288 RepairRunnable.java:246 - Repair session 89af4d10-856f-11e6-b28f-df99132d7979 for range [(8323429577695061526,8326640819362122791], ..., (4212695343340915405,4229348077081465596]]] Validation failed in / 10.45.113.88"
for one of the tables. 10.45.113.88 is the ip of the machine I am running the nodetool on. I'm wondering if this is normal... Thanks, Robert Robert Sicoie On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski < a...@thelastpickle.com> wrote: > Hi, > > nodetool scrub won't help here, as what you're experiencing is most likely > that one SSTable is going through anticompaction, and then another node is > asking for a Merkle tree that involves it. > For understandable reasons, an SSTable cannot be anticompacted and > validation compacted at the same time. > > The solution here is to adjust the repair pressure on your cluster so that > anticompaction can end before you run repair on another node. > You may have a lot of anticompaction to do if you had high volumes of > unrepaired data, which can take a long time depending on several factors. > > You can tune your repair process to make sure no anticompaction is running > before launching a new session on another node or you can try my Reaper > fork that handles incremental repair : https://github.com/ > adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui > I may have to add a few checks in order to avoid all collisions between > anticompactions and new sessions, but it should be helpful if you struggle > with incremental repair. > > In any case, check if your nodes are still anticompacting before trying to > run a new repair session on a node. > > Cheers, > > > On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <robert.sic...@gmail.com> > wrote: > >> Hi guys, >> >> I have a cluster of 5 nodes, cassandra 3.0.5. >> I was running nodetool repair last days, one node at a time, when I first >> encountered this exception >> >> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409 >> CassandraDaemon.java:195 - Exception in thread >> Thread[ValidationExecutor:11,1,main]* >> *java.lang.RuntimeException: Cannot start multiple repair sessions over >> the same sstables* >> * at >> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> ~[na:1.8.0_60]* >> * at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> ~[na:1.8.0_60]* >> * at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_60]* >> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >> >> On some of the other boxes I see this: >> >> >> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair >> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv, >> [(-7505573573695693981,-7495786486761919991],* >> *....* >> * (-8483612809930827919,-8480482504800860871]]] Validation failed in >> /10.45.113.67 <http://10.45.113.67>* >> * at >> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at org.apache.cassandra.net >> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> ~[na:1.8.0_60]* >> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> ~[na:1.8.0_60]* >> * at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_60]* >> * at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_60]* >> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 CassandraDaemon.java:195 >> - Exception in thread Thread[RepairJobTask:3,5,RMI Runtime]* >> *java.lang.AssertionError: java.lang.InterruptedException* >> * at org.apache.cassandra.net >> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at org.apache.cassandra.net >> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at org.apache.cassandra.net >> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> ~[na:1.8.0_60]* >> * at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> ~[na:1.8.0_60]* >> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]* >> *Caused by: java.lang.InterruptedException: null* >> * at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) >> ~[na:1.8.0_60]* >> * at >> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) >> ~[na:1.8.0_60]* >> * at >> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) >> ~[na:1.8.0_60]* >> * at org.apache.cassandra.net >> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168) >> ~[apache-cassandra-3.0.5.jar:3.0.5]* >> * ... 6 common frames omitted* >> >> >> Now if I run nodetool repair I get the >> >> *java.lang.RuntimeException: Cannot start multiple repair sessions over >> the same sstables* >> >> exception. >> What do you suggest? would nodetool scrub or sstablescrub help in this >> case. or it would just make it worse? >> >> Thanks, >> >> Robert >> > -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >