Rhys Campbell created CASSANDRA-15109:
-----------------------------------------

             Summary: nodetool repair failing with "Validation failed in 
/10.222.5.44"
                 Key: CASSANDRA-15109
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15109
             Project: Cassandra
          Issue Type: Bug
          Components: Tool/nodetool
            Reporter: Rhys Campbell


*Cassandra Version:* 2.2.13

*Command*

 
{noformat}
nodetool -h 127.0.0.1 -p 7199 repair -pr -full{noformat}
 

*Sample Output*

 
{noformat}
May  3 13:26:13 xxxxxxx cassandra: ERROR 11:26:13 Failed creating a merkle tree 
for [repair #8a6859c0-6d95-11e9-b769-5964d82f38b1 on ks/table, 
(4812194106185100517,5213210281700525452]], /X.X.5.42 (see log for 
details){noformat}
 

On the mentioned node we have the following info logged...

 
{noformat}
May  3 13:26:13 XXXXXXXX cassandra: ERROR 11:26:13 Failed creating a merkle 
tree for [repair #8a6859c0-6d95-11e9-b769-5964d82f38b1 on ks/taböe, 
(4812194106185100517,5213210281700525452]], /X.X.5.42 (see log for 
details){noformat}
 

These are always (as seen so far) preceeded  by...

 
{noformat}
Apr 29 00:45:04 XXXXXXXX cassandra: INFO 22:45:04 InetAddress /X.X.5.42 is now 
DOWN
Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 Handshaking version with 
/10.223.5.42
Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 InetAddress /X.X.5.42 is now 
UP{noformat}
 

and followed by a Java stack Trace...

 
{noformat}
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread 
Thread[ValidationExecutor:43,1,main]
Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException: Parent repair 
session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1206)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1131)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748) 
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing 
Memtable-compactions_in_progress@2106381056(0.156KiB serialized bytes, 9 ops, 
0%/0% of on/off-heap limit)
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Handshaking version with 
/10.223.5.42
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing 
Memtable-compactions_in_progress@134296463(0.008KiB serialized bytes, 1 ops, 
0%/0% of on/off-heap limit)
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Got error, removing parent 
repair session
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread 
Thread[AntiEntropyStage:1,5,main]
Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException: 
java.lang.RuntimeException: Parent repair session with id = 
8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:183)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748) 
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: Caused by: java.lang.RuntimeException: 
Parent repair session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:432)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:155)
 ~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: ... 6 common frames omitted{noformat}
 

I've tried a few combinations of options with the nodetool repair command. Here 
are the results...

 
{noformat}
parallelism: parallel, primary range: true, incremental: false - NOK
parallelism: parallel, primary range: false, incremental: false - NOK
parallelism: parallel, primary range: false, incremental: false - NOK
parallelism: sequential, primary range: false, incremental: false - NOK 
(Although I get a different error failed with error Could not create snapshot 
at /X.X.5.43 (progress: 60%))
parallelism: parallel, primary range: false, incremental: true - OK
{noformat}
This only started happening relatively recently. There's been no major, or 
minor changes, to our system that we think would result in this. This is 
happening on every node in one DC and on a few in the second. The "Failed 
creating merkle tree" error is present on every node but most of the nodes in 
the second DC seem to complete their repair. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to