[
https://issues.apache.org/jira/browse/CASSANDRA-20348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931880#comment-17931880
]
Aayush Gupta commented on CASSANDRA-20348:
------------------------------------------
Hii [~aratnofsky] ,
We are getting 2 errors on all nodes. After these errors we see the merkle tree
errors. Please help in the resolution.
*1st error*
[ERROR] [Repair-Task:1] 2025-02-28 06:59:54,357 RepairRunnable.java:178 -
Repair 7fd0c7d0-f5cb-11ef-9e89-67d429cb20c9 failed:
java.lang.RuntimeException: Did not get replies from all endpoints.
at
org.apache.cassandra.service.ActiveRepairService.failRepair(ActiveRepairService.java:665)
at
org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:605)
at
org.apache.cassandra.repair.RepairRunnable.prepare(RepairRunnable.java:393)
at
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
at org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:825)
*2nd Error*
[ERROR] [Repair#6412:1] 2025-02-28 07:00:15,226 RepairRunnable.java:178 -
Repair 92123e10-f5cb-11ef-9e89-67d429cb20c9 failed:
java.lang.RuntimeException: Repair session 9228ac40-f5cb-11ef-9e89-67d429cb20c9
for range [(7560435422318582316,7621922760098450692]] failed with error [repair
#9228ac40-f5cb-11ef-9e89-67d429cb20c9 on
reaper_db/repair_schedule_by_cluster_and_keyspace,
[(7560435422318582316,7621922760098450692]]] Got VALIDATION_REQ failure from
/10.X.X.X:7000: UNKNOWN
at
org.apache.cassandra.repair.RepairRunnable$RepairSessionCallback.onFailure(RepairRunnable.java:698)
at
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
at
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
at
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
at
org.apache.cassandra.repair.RepairSession.forceShutdown(RepairSession.java:342)
at
org.apache.cassandra.repair.RepairSession$1.onFailure(RepairSession.java:323)
at
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:825)
Caused by: org.apache.cassandra.exceptions.RepairException: [repair
#9228ac40-f5cb-11ef-9e89-67d429cb20c9 on
reaper_db/repair_schedule_by_cluster_and_keyspace,
[(7560435422318582316,7621922760098450692]]] Got VALIDATION_REQ failure from
/10.X.X.X:7000: UNKNOWN
at
org.apache.cassandra.repair.messages.RepairMessage$1.onFailure(RepairMessage.java:81)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
... 2 common frames omitted
> Issue with Merkle Tree Creation and Parent Repair Session Failure
> -----------------------------------------------------------------
>
> Key: CASSANDRA-20348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20348
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Aayush Gupta
> Priority: Normal
>
> We encountered an error while attempting to validate a repair session on our
> Cassandra cluster. The following issues were observed:
> * The validation failed during the creation of a Merkle tree for the repair
> session with ID {{a36dbed0-d787-11ef-84a0-59e4de22a9ca}} on the
> {{mailbox/messages_by_id}} table. Several ranges of SSTables were involved in
> the failure, and the logs indicate that the session could not be successfully
> validated due to issues with these SSTables.
> * The logs also show a failure in the parent repair session with ID
> {{{}a3695200-d787-11ef-84a0-59e4de22a9ca{}}}. The error traceback reveals
> that the {{ActiveRepairService}} could not retrieve the parent repair
> session, leading to the failure of the repair process. This caused further
> issues with repair message handling in the system.
> * The CassandraDaemon logs indicate that the error was related to an invalid
> or incomplete parent repair session, which was being handled by the
> {{{}RepairMessageVerbHandler{}}}. The repair operation failed, and the system
> attempted to remove the parent repair session, but was unable to proceed
> successfully.
> {*}Logs{*}:
> # The validation failed due to the inability to create a Merkle tree for a
> set of SSTables, and the parent repair session was marked as failed.
> # The {{RepairMessageVerbHandler}} encountered an exception, leading to the
> failure of the repair process, as seen in the attached stack traces.
> {*}Request{*}: We would appreciate assistance in understanding the cause of
> this failure, as well as any recommendations for resolving it. Specifically,
> any guidance on recovering the parent repair session or resolving issues with
> the Merkle tree creation would be helpful.
> Logs :
> [ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942 Validator.java:237 -
> Failed creating a merkle tree for [repair
> #a36dbed0-d787-11ef-84a0-59e4de22a9ca on mailbox/messages_by_id,
> [(5734736046850292958,5753303790549862573],
> (5013853684854274868,5016711782125970873],
> (2418110930062086372,2421594217423380863],
> (3333294399526748440,3333803051887680609],
> (-7673852896118720379,-7668669570613527038],
> (-8735570610439541581,-8735399615559719989],
> (3467842041551014709,3480644019042625019]]], /10.X.X.X:7000 (see log for
> details)
> [ERROR] [ValidationExecutor:196] 2025-01-20 17:38:23,942
> ValidationManager.java:173 - Validation failed.
> java.lang.RuntimeException: Parent repair session with id =
> a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
> at
> org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
> at
> org.apache.cassandra.db.repair.CassandraValidationIterator.<init>(CassandraValidationIterator.java:203)
> at
> org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
> at
> org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
> at
> org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
> at
> org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
> at
> org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
> at java.util.concurrent.FutureTask.run(FutureTask.java:277)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:826)
>
>
>
> Logs from 10.X.X.X:7000 :
>
> [ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,724
> RepairMessageVerbHandler.java:212 - Got error, removing parent repair session
> [ERROR] [AntiEntropyStage:1] 2025-01-20 17:38:23,725 CassandraDaemon.java:581
> - Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session
> with id = a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:215)
> at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> at
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
> at java.util.concurrent.FutureTask.run(FutureTask.java:277)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:826)
> Caused by: java.lang.RuntimeException: Parent repair session with id =
> a3695200-d787-11ef-84a0-59e4de22a9ca has failed.
> at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.previewKind(RepairMessageVerbHandler.java:55)
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
> ... 10 common frames omitted
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]