Aayush Gupta created CASSANDRA-20348:
----------------------------------------

             Summary: Cassandra Merkel Tree errors
                 Key: CASSANDRA-20348
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20348
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: Aayush Gupta


We encountered an error while attempting to validate a repair session on our 
Cassandra cluster. The following issues were observed:
 * The validation failed during the creation of a Merkle tree for the repair 
session with ID {{a36dbed0-d787-11ef-84a0-59e4de22a9ca}} on the 
{{mailbox/messages_by_id}} table. Several ranges of SSTables were involved in 
the failure, and the logs indicate that the session could not be successfully 
validated due to issues with these SSTables.

 * The logs also show a failure in the parent repair session with ID 
{{{}a3695200-d787-11ef-84a0-59e4de22a9ca{}}}. The error traceback reveals that 
the {{ActiveRepairService}} could not retrieve the parent repair session, 
leading to the failure of the repair process. This caused further issues with 
repair message handling in the system.

 * The CassandraDaemon logs indicate that the error was related to an invalid 
or incomplete parent repair session, which was being handled by the 
{{{}RepairMessageVerbHandler{}}}. The repair operation failed, and the system 
attempted to remove the parent repair session, but was unable to proceed 
successfully.

{*}Logs{*}:
 # The validation failed due to the inability to create a Merkle tree for a set 
of SSTables, and the parent repair session was marked as failed.
 # The {{RepairMessageVerbHandler}} encountered an exception, leading to the 
failure of the repair process, as seen in the attached stack traces.

{*}Request{*}: We would appreciate assistance in understanding the cause of 
this failure, as well as any recommendations for resolving it. Specifically, 
any guidance on recovering the parent repair session or resolving issues with 
the Merkle tree creation would be helpful.



Logs : 



[ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 Validator.java:237 - 
Failed creating a merkle tree for [repair #a36dbed0-d787-11ef-84a0-59e4de22a9ca 
on mailbox/messages_by_id, [(5734736046850292958,5753303790549862573], 
(5013853684854274868,5016711782125970873], 
(2418110930062086372,2421594217423380863], 
(3333294399526748440,3333803051887680609], 
(-7673852896118720379,-7668669570613527038], 
(-8735570610439541581,-8735399615559719989], 
(3467842041551014709,3480644019042625019]]], /10.X.X.X:7000 (see log for 
details)
[ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 
ValidationManager.java:173 - Validation failed.
java.lang.RuntimeException: Parent repair session with id = 
a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
at 
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at 
org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
at 
org.apache.cassandra.db.repair.CassandraValidationIterator.<init>(CassandraValidationIterator.java:203)
at 
org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
at 
org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
at 
org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
at 
org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
at 
org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)
 
 
 
Logs from 10.X.X.X:7000 :
 
[ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,724 
RepairMessageVerbHandler.java:212 - Got error, removing parent repair session
[ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,725 CassandraDaemon.java:581 - 
Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session 
with id = a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:215)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)
Caused by: java.lang.RuntimeException: Parent repair session with id = 
a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
at 
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.previewKind(RepairMessageVerbHandler.java:55)
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
... 10 common frames omitted
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to