Robert, You can restart them in any order, that doesn't make a difference afaik.
Cheers Le mer. 28 sept. 2016 17:10, Robert Sicoie <robert.sic...@gmail.com> a écrit : > Thanks Alexander, > > Yes, with tpstats I can see the hanging active repair(s) (output > attached). For one there are 31 pending repair. On others there are less > pending repairs (min 12). Is there any recomandation for the restart order? > The one with more less pending repairs first, perhaps? > > Thanks, > Robert > > Robert Sicoie > > On Wed, Sep 28, 2016 at 5:35 PM, Alexander Dejanovski < > a...@thelastpickle.com> wrote: > >> They will show up in nodetool compactionstats : >> https://issues.apache.org/jira/browse/CASSANDRA-9098 >> >> Did you check nodetool tpstats to see if you didn't have any running >> repair session ? >> Just to make sure (and if you can actually do it), roll restart the >> cluster and try again. Repair sessions can get sticky sometimes. >> >> On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie <robert.sic...@gmail.com> >> wrote: >> >>> I am using nodetool compactionstats to check for pending compactions and >>> it shows me 0 pending on all nodes, seconds before running nodetool repair. >>> I am also monitoring PendingCompactions on jmx. >>> >>> Is there other way I can find out if is there any anticompaction running >>> on any node? >>> >>> Thanks a lot, >>> Robert >>> >>> Robert Sicoie >>> >>> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski < >>> a...@thelastpickle.com> wrote: >>> >>>> Robert, >>>> >>>> you need to make sure you have no repair session currently running on >>>> your cluster, and no anticompaction. >>>> I'd recommend doing a rolling restart in order to stop all running >>>> repair for sure, then start the process again, node by node, checking that >>>> no anticompaction is running before moving from one node to the other. >>>> >>>> Please do not use the -pr switch as it is both useless (token ranges >>>> are repaired only once with inc repair, whatever the replication factor) >>>> and harmful as all anticompactions won't be executed (you'll still have >>>> sstables marked as unrepaired even if the process has ran entirely with no >>>> error). >>>> >>>> Let us know how that goes. >>>> >>>> Cheers, >>>> >>>> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <robert.sic...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Alexander, >>>>> >>>>> Now I started to run the repair with -pr arg and with keyspace and >>>>> table args. >>>>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288 >>>>> RepairRunnable.java:246 - Repair session >>>>> 89af4d10-856f-11e6-b28f-df99132d7979 for range >>>>> [(8323429577695061526,8326640819362122791], >>>>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in / >>>>> 10.45.113.88" >>>>> >>>>> for one of the tables. 10.45.113.88 is the ip of the machine I am >>>>> running the nodetool on. >>>>> I'm wondering if this is normal... >>>>> >>>>> Thanks, >>>>> Robert >>>>> >>>>> >>>>> >>>>> >>>>> Robert Sicoie >>>>> >>>>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski < >>>>> a...@thelastpickle.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> nodetool scrub won't help here, as what you're experiencing is most >>>>>> likely that one SSTable is going through anticompaction, and then another >>>>>> node is asking for a Merkle tree that involves it. >>>>>> For understandable reasons, an SSTable cannot be anticompacted and >>>>>> validation compacted at the same time. >>>>>> >>>>>> The solution here is to adjust the repair pressure on your cluster so >>>>>> that anticompaction can end before you run repair on another node. >>>>>> You may have a lot of anticompaction to do if you had high volumes of >>>>>> unrepaired data, which can take a long time depending on several factors. >>>>>> >>>>>> You can tune your repair process to make sure no anticompaction is >>>>>> running before launching a new session on another node or you can try my >>>>>> Reaper fork that handles incremental repair : >>>>>> https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui >>>>>> I may have to add a few checks in order to avoid all collisions >>>>>> between anticompactions and new sessions, but it should be helpful if you >>>>>> struggle with incremental repair. >>>>>> >>>>>> In any case, check if your nodes are still anticompacting before >>>>>> trying to run a new repair session on a node. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> >>>>>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie < >>>>>> robert.sic...@gmail.com> wrote: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I have a cluster of 5 nodes, cassandra 3.0.5. >>>>>>> I was running nodetool repair last days, one node at a time, when I >>>>>>> first encountered this exception >>>>>>> >>>>>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409 >>>>>>> CassandraDaemon.java:195 - Exception in thread >>>>>>> Thread[ValidationExecutor:11,1,main]* >>>>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions >>>>>>> over the same sstables* >>>>>>> * at >>>>>>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>> [na:1.8.0_60]* >>>>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>>>>>> >>>>>>> On some of the other boxes I see this: >>>>>>> >>>>>>> >>>>>>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair >>>>>>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv, >>>>>>> [(-7505573573695693981,-7495786486761919991],* >>>>>>> *....* >>>>>>> * (-8483612809930827919,-8480482504800860871]]] Validation failed in >>>>>>> /10.45.113.67 <http://10.45.113.67>* >>>>>>> * at >>>>>>> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at org.apache.cassandra.net >>>>>>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>> [na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>> [na:1.8.0_60]* >>>>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>>>>>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 >>>>>>> CassandraDaemon.java:195 - Exception in thread >>>>>>> Thread[RepairJobTask:3,5,RMI >>>>>>> Runtime]* >>>>>>> *java.lang.AssertionError: java.lang.InterruptedException* >>>>>>> * at org.apache.cassandra.net >>>>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at org.apache.cassandra.net >>>>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at org.apache.cassandra.net >>>>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]* >>>>>>> *Caused by: java.lang.InterruptedException: null* >>>>>>> * at >>>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at >>>>>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) >>>>>>> ~[na:1.8.0_60]* >>>>>>> * at org.apache.cassandra.net >>>>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168) >>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>>>> * ... 6 common frames omitted* >>>>>>> >>>>>>> >>>>>>> Now if I run nodetool repair I get the >>>>>>> >>>>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions >>>>>>> over the same sstables* >>>>>>> >>>>>>> exception. >>>>>>> What do you suggest? would nodetool scrub or sstablescrub help in >>>>>>> this case. or it would just make it worse? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Robert >>>>>>> >>>>>> -- >>>>>> ----------------- >>>>>> Alexander Dejanovski >>>>>> France >>>>>> @alexanderdeja >>>>>> >>>>>> Consultant >>>>>> Apache Cassandra Consulting >>>>>> http://www.thelastpickle.com >>>>>> >>>>> >>>>> -- >>>> ----------------- >>>> Alexander Dejanovski >>>> France >>>> @alexanderdeja >>>> >>>> Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >>> -- >> ----------------- >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com