Aayush Gupta created CASSANDRA-20348:
----------------------------------------
Summary: Cassandra Merkel Tree errors
Key: CASSANDRA-20348
URL: https://issues.apache.org/jira/browse/CASSANDRA-20348
Project: Apache Cassandra
Issue Type: Bug
Reporter: Aayush Gupta
We encountered an error while attempting to validate a repair session on our
Cassandra cluster. The following issues were observed:
* The validation failed during the creation of a Merkle tree for the repair
session with ID {{a36dbed0-d787-11ef-84a0-59e4de22a9ca}} on the
{{mailbox/messages_by_id}} table. Several ranges of SSTables were involved in
the failure, and the logs indicate that the session could not be successfully
validated due to issues with these SSTables.
* The logs also show a failure in the parent repair session with ID
{{{}a3695200-d787-11ef-84a0-59e4de22a9ca{}}}. The error traceback reveals that
the {{ActiveRepairService}} could not retrieve the parent repair session,
leading to the failure of the repair process. This caused further issues with
repair message handling in the system.
* The CassandraDaemon logs indicate that the error was related to an invalid
or incomplete parent repair session, which was being handled by the
{{{}RepairMessageVerbHandler{}}}. The repair operation failed, and the system
attempted to remove the parent repair session, but was unable to proceed
successfully.
{*}Logs{*}:
# The validation failed due to the inability to create a Merkle tree for a set
of SSTables, and the parent repair session was marked as failed.
# The {{RepairMessageVerbHandler}} encountered an exception, leading to the
failure of the repair process, as seen in the attached stack traces.
{*}Request{*}: We would appreciate assistance in understanding the cause of
this failure, as well as any recommendations for resolving it. Specifically,
any guidance on recovering the parent repair session or resolving issues with
the Merkle tree creation would be helpful.
Logs :
[ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 Validator.java:237 -
Failed creating a merkle tree for [repair #a36dbed0-d787-11ef-84a0-59e4de22a9ca
on mailbox/messages_by_id, [(5734736046850292958,5753303790549862573],
(5013853684854274868,5016711782125970873],
(2418110930062086372,2421594217423380863],
(3333294399526748440,3333803051887680609],
(-7673852896118720379,-7668669570613527038],
(-8735570610439541581,-8735399615559719989],
(3467842041551014709,3480644019042625019]]], /10.X.X.X:7000 (see log for
details)
[ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942
ValidationManager.java:173 - Validation failed.
java.lang.RuntimeException: Parent repair session with id =
a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
at
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at
org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
at
org.apache.cassandra.db.repair.CassandraValidationIterator.<init>(CassandraValidationIterator.java:203)
at
org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
at
org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
at
org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
at
org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
at
org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)
Logs from 10.X.X.X:7000 :
[ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,724
RepairMessageVerbHandler.java:212 - Got error, removing parent repair session
[ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,725 CassandraDaemon.java:581 -
Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session
with id = a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:215)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)
Caused by: java.lang.RuntimeException: Parent repair session with id =
a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
at
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at
org.apache.cassandra.repair.RepairMessageVerbHandler.previewKind(RepairMessageVerbHandler.java:55)
at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
... 10 common frames omitted
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]