[
https://issues.apache.org/jira/browse/CASSANDRA-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rhys Campbell updated CASSANDRA-15109:
--------------------------------------
Description:
*Cassandra Version:* 2.2.13
*Command*
{noformat}
nodetool -h 127.0.0.1 -p 7199 repair -pr -full{noformat}
*Sample Output*
{noformat}
Repair session c230e910-6d74-11e9-8952-a70261a0ced8 for range
(4812194106185100517,5213210281700525452] failed with error [repair
#c230e910-6d74-11e9-8952-a70261a0ced8 on ks/table,
(4812194106185100517,5213210281700525452]] Validation failed in /10.223.5.44
(progress: 100%)
{noformat}
On the mentioned node we have the following info logged...
{noformat}
May 3 13:26:13 XXXXXXXX cassandra: ERROR 11:26:13 Failed creating a merkle
tree for [repair #8a6859c0-6d95-11e9-b769-5964d82f38b1 on ks/table,
(4812194106185100517,5213210281700525452]], /X.X.5.42 (see log for
details){noformat}
These are always (as seen so far) preceeded by...
{noformat}
Apr 29 00:45:04 XXXXXXXX cassandra: INFO 22:45:04 InetAddress /X.X.5.42 is now
DOWN
Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 Handshaking version with
/10.223.5.42
Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 InetAddress /X.X.5.42 is now
UP{noformat}
and followed by a Java stack Trace...
{noformat}
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread
Thread[ValidationExecutor:43,1,main]
Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException: Parent repair
session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1206)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1131)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing
Memtable-compactions_in_progress@2106381056(0.156KiB serialized bytes, 9 ops,
0%/0% of on/off-heap limit)
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Handshaking version with
/10.223.5.42
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing
Memtable-compactions_in_progress@134296463(0.008KiB serialized bytes, 1 ops,
0%/0% of on/off-heap limit)
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Got error, removing parent
repair session
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread
Thread[AntiEntropyStage:1,5,main]
Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException:
java.lang.RuntimeException: Parent repair session with id =
8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:183)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: Caused by: java.lang.RuntimeException:
Parent repair session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:432)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:155)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: ... 6 common frames omitted{noformat}
I've tried a few combinations of options with the nodetool repair command. Here
are the results...
{noformat}
parallelism: parallel, primary range: true, incremental: false - NOK
parallelism: parallel, primary range: false, incremental: false - NOK
parallelism: parallel, primary range: false, incremental: false - NOK
parallelism: sequential, primary range: false, incremental: false - NOK
(Although I get a different error failed with error Could not create snapshot
at /X.X.5.43 (progress: 60%))
parallelism: parallel, primary range: false, incremental: true - OK
{noformat}
This only started happening relatively recently. There's been no major, or
minor changes, to our system that we think would result in this. This is
happening on every node in one DC and on a few in the second. The "Failed
creating merkle tree" error is present on every node but most of the nodes in
the second DC seem to complete their repair.
was:
*Cassandra Version:* 2.2.13
*Command*
{noformat}
nodetool -h 127.0.0.1 -p 7199 repair -pr -full{noformat}
*Sample Output*
{noformat}
May 3 13:26:13 xxxxxxx cassandra: ERROR 11:26:13 Failed creating a merkle tree
for [repair #8a6859c0-6d95-11e9-b769-5964d82f38b1 on ks/table,
(4812194106185100517,5213210281700525452]], /X.X.5.42 (see log for
details){noformat}
On the mentioned node we have the following info logged...
{noformat}
May 3 13:26:13 XXXXXXXX cassandra: ERROR 11:26:13 Failed creating a merkle
tree for [repair #8a6859c0-6d95-11e9-b769-5964d82f38b1 on ks/taböe,
(4812194106185100517,5213210281700525452]], /X.X.5.42 (see log for
details){noformat}
These are always (as seen so far) preceeded by...
{noformat}
Apr 29 00:45:04 XXXXXXXX cassandra: INFO 22:45:04 InetAddress /X.X.5.42 is now
DOWN
Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 Handshaking version with
/10.223.5.42
Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 InetAddress /X.X.5.42 is now
UP{noformat}
and followed by a Java stack Trace...
{noformat}
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread
Thread[ValidationExecutor:43,1,main]
Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException: Parent repair
session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1206)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1131)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing
Memtable-compactions_in_progress@2106381056(0.156KiB serialized bytes, 9 ops,
0%/0% of on/off-heap limit)
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Handshaking version with
/10.223.5.42
Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing
Memtable-compactions_in_progress@134296463(0.008KiB serialized bytes, 1 ops,
0%/0% of on/off-heap limit)
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Got error, removing parent
repair session
Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread
Thread[AntiEntropyStage:1,5,main]
Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException:
java.lang.RuntimeException: Parent repair session with id =
8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:183)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748)
[na:1.8.0_172]
Apr 29 00:45:10 XXXXXXXX cassandra: Caused by: java.lang.RuntimeException:
Parent repair session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:432)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:155)
~[apache-cassandra-2.2.13.jar:2.2.13]
Apr 29 00:45:10 XXXXXXXX cassandra: ... 6 common frames omitted{noformat}
I've tried a few combinations of options with the nodetool repair command. Here
are the results...
{noformat}
parallelism: parallel, primary range: true, incremental: false - NOK
parallelism: parallel, primary range: false, incremental: false - NOK
parallelism: parallel, primary range: false, incremental: false - NOK
parallelism: sequential, primary range: false, incremental: false - NOK
(Although I get a different error failed with error Could not create snapshot
at /X.X.5.43 (progress: 60%))
parallelism: parallel, primary range: false, incremental: true - OK
{noformat}
This only started happening relatively recently. There's been no major, or
minor changes, to our system that we think would result in this. This is
happening on every node in one DC and on a few in the second. The "Failed
creating merkle tree" error is present on every node but most of the nodes in
the second DC seem to complete their repair.
> nodetool repair failing with "Validation failed in /10.222.5.44"
> ----------------------------------------------------------------
>
> Key: CASSANDRA-15109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15109
> Project: Cassandra
> Issue Type: Bug
> Components: Tool/nodetool
> Reporter: Rhys Campbell
> Priority: Normal
>
> *Cassandra Version:* 2.2.13
> *Command*
>
> {noformat}
> nodetool -h 127.0.0.1 -p 7199 repair -pr -full{noformat}
>
> *Sample Output*
>
> {noformat}
> Repair session c230e910-6d74-11e9-8952-a70261a0ced8 for range
> (4812194106185100517,5213210281700525452] failed with error [repair
> #c230e910-6d74-11e9-8952-a70261a0ced8 on ks/table,
> (4812194106185100517,5213210281700525452]] Validation failed in /10.223.5.44
> (progress: 100%)
> {noformat}
>
> On the mentioned node we have the following info logged...
>
> {noformat}
> May 3 13:26:13 XXXXXXXX cassandra: ERROR 11:26:13 Failed creating a merkle
> tree for [repair #8a6859c0-6d95-11e9-b769-5964d82f38b1 on ks/table,
> (4812194106185100517,5213210281700525452]], /X.X.5.42 (see log for
> details){noformat}
>
> These are always (as seen so far) preceeded by...
>
> {noformat}
> Apr 29 00:45:04 XXXXXXXX cassandra: INFO 22:45:04 InetAddress /X.X.5.42 is
> now DOWN
> Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 Handshaking version with
> /10.223.5.42
> Apr 29 00:45:09 XXXXXXXX cassandra: INFO 22:45:09 InetAddress /X.X.5.42 is
> now UP{noformat}
>
> and followed by a Java stack Trace...
>
> {noformat}
> Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread
> Thread[ValidationExecutor:43,1,main]
> Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException: Parent repair
> session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1206)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1131)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748)
> [na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing
> Memtable-compactions_in_progress@2106381056(0.156KiB serialized bytes, 9 ops,
> 0%/0% of on/off-heap limit)
> Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Handshaking version with
> /10.223.5.42
> Apr 29 00:45:10 XXXXXXXX cassandra: INFO 22:45:10 Writing
> Memtable-compactions_in_progress@134296463(0.008KiB serialized bytes, 1 ops,
> 0%/0% of on/off-heap limit)
> Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Got error, removing parent
> repair session
> Apr 29 00:45:10 XXXXXXXX cassandra: ERROR 22:45:10 Exception in thread
> Thread[AntiEntropyStage:1,5,main]
> Apr 29 00:45:10 XXXXXXXX cassandra: java.lang.RuntimeException:
> java.lang.RuntimeException: Parent repair session with id =
> 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has failed.
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:183)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: at java.lang.Thread.run(Thread.java:748)
> [na:1.8.0_172]
> Apr 29 00:45:10 XXXXXXXX cassandra: Caused by: java.lang.RuntimeException:
> Parent repair session with id = 8f9fe6c0-6a06-11e9-bd05-21e986c06e90 has
> failed.
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:398)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:432)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:155)
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Apr 29 00:45:10 XXXXXXXX cassandra: ... 6 common frames omitted{noformat}
>
> I've tried a few combinations of options with the nodetool repair command.
> Here are the results...
>
> {noformat}
> parallelism: parallel, primary range: true, incremental: false - NOK
> parallelism: parallel, primary range: false, incremental: false - NOK
> parallelism: parallel, primary range: false, incremental: false - NOK
> parallelism: sequential, primary range: false, incremental: false - NOK
> (Although I get a different error failed with error Could not create snapshot
> at /X.X.5.43 (progress: 60%))
> parallelism: parallel, primary range: false, incremental: true - OK
> {noformat}
> This only started happening relatively recently. There's been no major, or
> minor changes, to our system that we think would result in this. This is
> happening on every node in one DC and on a few in the second. The "Failed
> creating merkle tree" error is present on every node but most of the nodes in
> the second DC seem to complete their repair.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]