[
https://issues.apache.org/jira/browse/CASSANDRA-20348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aayush Gupta updated CASSANDRA-20348:
-------------------------------------
Summary: Issue with Merkle Tree Creation and Parent Repair Session Failure
(was: Cassandra Merkel Tree errors)
> Issue with Merkle Tree Creation and Parent Repair Session Failure
> -----------------------------------------------------------------
>
> Key: CASSANDRA-20348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20348
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Aayush Gupta
> Priority: Normal
>
> We encountered an error while attempting to validate a repair session on our
> Cassandra cluster. The following issues were observed:
> * The validation failed during the creation of a Merkle tree for the repair
> session with ID {{a36dbed0-d787-11ef-84a0-59e4de22a9ca}} on the
> {{mailbox/messages_by_id}} table. Several ranges of SSTables were involved in
> the failure, and the logs indicate that the session could not be successfully
> validated due to issues with these SSTables.
> * The logs also show a failure in the parent repair session with ID
> {{{}a3695200-d787-11ef-84a0-59e4de22a9ca{}}}. The error traceback reveals
> that the {{ActiveRepairService}} could not retrieve the parent repair
> session, leading to the failure of the repair process. This caused further
> issues with repair message handling in the system.
> * The CassandraDaemon logs indicate that the error was related to an invalid
> or incomplete parent repair session, which was being handled by the
> {{{}RepairMessageVerbHandler{}}}. The repair operation failed, and the system
> attempted to remove the parent repair session, but was unable to proceed
> successfully.
> {*}Logs{*}:
> # The validation failed due to the inability to create a Merkle tree for a
> set of SSTables, and the parent repair session was marked as failed.
> # The {{RepairMessageVerbHandler}} encountered an exception, leading to the
> failure of the repair process, as seen in the attached stack traces.
> {*}Request{*}: We would appreciate assistance in understanding the cause of
> this failure, as well as any recommendations for resolving it. Specifically,
> any guidance on recovering the parent repair session or resolving issues with
> the Merkle tree creation would be helpful.
> Logs :
> [ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 Validator.java:237 -
> Failed creating a merkle tree for [repair
> #a36dbed0-d787-11ef-84a0-59e4de22a9ca on mailbox/messages_by_id,
> [(5734736046850292958,5753303790549862573],
> (5013853684854274868,5016711782125970873],
> (2418110930062086372,2421594217423380863],
> (3333294399526748440,3333803051887680609],
> (-7673852896118720379,-7668669570613527038],
> (-8735570610439541581,-8735399615559719989],
> (3467842041551014709,3480644019042625019]]], /10.X.X.X:7000 (see log for
> details)
> [ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942
> ValidationManager.java:173 - Validation failed.
> java.lang.RuntimeException: Parent repair session with id =
> a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
> at
> org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
> at
> org.apache.cassandra.db.repair.CassandraValidationIterator.<init>(CassandraValidationIterator.java:203)
> at
> org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
> at
> org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
> at
> org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
> at
> org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
> at
> org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
> at java.util.concurrent.FutureTask.run(FutureTask.java:277)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:826)
>
>
>
> Logs from 10.X.X.X:7000 :
>
> [ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,724
> RepairMessageVerbHandler.java:212 - Got error, removing parent repair session
> [ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,725 CassandraDaemon.java:581
> - Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session
> with id = a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:215)
> at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> at
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
> at java.util.concurrent.FutureTask.run(FutureTask.java:277)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:826)
> Caused by: java.lang.RuntimeException: Parent repair session with id =
> a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.previewKind(RepairMessageVerbHandler.java:55)
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
> ... 10 common frames omitted
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]