Hi All - been using reaper to do repairs, but it has hung. I tried to run:
nodetool repair -pr
on each of the nodes, but they all fail with some form of this error:
error: Repair job has failed with the error message: Repair command #521
failed with error Did not get replies from all endpoints.. Check the
logs on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error
message: Repair command #521 failed with error Did not get replies from
all endpoints.. Check the logs on the repair participants for further
details
at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
at
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
at
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
at
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
at
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
at java.base/java.lang.Thread.run(Thread.java:829)
Using version 4.1.2-1
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host
ID Rack
UN 172.16.100.45 505.66 GiB 250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98 rack1
UN 172.16.100.251 380.75 GiB 200 ?
274a6e8d-de37-4e0b-b000-02d221d858a5 rack1
UN 172.16.100.35 479.2 GiB 200 ?
59150c47-274a-46fb-9d5e-bed468d36797 rack1
UN 172.16.100.252 248.69 GiB 200 ?
8f0d392f-0750-44e2-91a5-b30708ade8e4 rack1
UN 172.16.100.249 411.53 GiB 200 ?
49e4f571-7d1c-4e1e-aca7-5bbe076596f7 rack1
UN 172.16.100.38 333.26 GiB 200 ?
0d9509cc-2f23-4117-a883-469a1be54baf rack1
UN 172.16.100.36 405.33 GiB 200 ?
d9702f96-256e-45ae-8e12-69a42712be50 rack1
UN 172.16.100.39 437.74 GiB 200 ?
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47 rack1
UN 172.16.100.248 344.4 GiB 200 ?
4bbbe57c-6219-41e5-bbac-de92a9594d53 rack1
UN 172.16.100.44 409.36 GiB 200 ?
b2e5366e-8386-40ec-a641-27944a5a7cfa rack1
UN 172.16.100.37 236.08 GiB 120 ?
08a19658-40be-4e55-8709-812b3d4ac750 rack1
UN 172.16.20.16 975 GiB 500 ?
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297 rack1
UN 172.16.100.34 340.77 GiB 200 ?
352fd049-32f8-4be8-9275-68b145ac2832 rack1
UN 172.16.100.42 974.86 GiB 500 ?
b088a8e6-42f3-4331-a583-47ef5149598f rack1
Note: Non-system keyspaces don't have the same replication settings,
effective ownership information is meaningless
Debug log has:
DEBUG [ScheduledTasks:1] 2023-08-04 11:56:04,955
MigrationCoordinator.java:264 - Pulling unreceived schema versions...
INFO [HintsDispatcher:11344] 2023-08-04 11:56:21,369
HintsDispatchExecutor.java:318 - Finished hinted handoff of file
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297-1690426370160-2.hints to endpoint
/172.16.20.16:7000: 1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297, partially
WARN
[Messaging-OUT-/172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES]
2023-08-04 11:56:21,916 OutboundConnection.java:491 -
/172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES-[no-channel]
dropping message of type HINT_REQ due to error
org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The
channel this output stream was writing to has been closed
at
org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
at
org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
at
org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
at
org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
at
org.apache.cassandra.net.AsyncMessageOutputPlus.doFlush(AsyncMessageOutputPlus.java:100)
at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:122)
at
org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:139)
at
org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:77)
at
org.apache.cassandra.net.Message$Serializer.serializePost40(Message.java:844)
at
org.apache.cassandra.net.Message$Serializer.serialize(Message.java:702)
at
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.doRun(OutboundConnection.java:984)
at
org.apache.cassandra.net.OutboundConnection$Delivery.run(OutboundConnection.java:690)
at
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.run(OutboundConnection.java:958)
at
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:124)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.netty.channel.unix.Errors$NativeIoException:
writeAddress(..) failed: Connection timed out
INFO [Messaging-EventLoop-3-16] 2023-08-04 11:56:21,918
OutboundConnection.java:1153 -
/172.16.100.34:7000(/172.16.100.34:59198)->/172.16.20.16:7000-LARGE_MESSAGES-2fc2c5b9
successfully connected, version = 12, framing = CRC, encryption =
unencrypted
ERROR [Repair-Task:437] 2023-08-04 11:56:28,592 RepairRunnable.java:160
- Repair 30675c00-32df-11ee-a7d8-05183c68b0d0 failed:
java.lang.RuntimeException: Did not get replies from all endpoints.
at
org.apache.cassandra.service.ActiveRepairService.failRepair(ActiveRepairService.java:721)
at
org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:654)
at
org.apache.cassandra.repair.RepairRunnable.prepare(RepairRunnable.java:400)
at
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:279)
at
org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:248)
at
org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
at
org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
at
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
at
org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
at
org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
at
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
INFO [Repair-Task:437] 2023-08-04 11:56:28,594 RepairRunnable.java:223
- [repair #30675c00-32df-11ee-a7d8-05183c68b0d0]Repair command #522
finished with error
What to do?
Thanks!
-Joe
--
This email has been checked for viruses by AVG antivirus software.
www.avg.com