Can you please try to do nodetool describecluster from every node of the
cluster?

One time I noticed issue when nodetool status shows all nodes UN but
describecluster was not.

Thanks
Surbhi

On Fri, Aug 4, 2023 at 8:59 AM Joe Obernberger <joseph.obernber...@gmail.com>
wrote:

> Hi All - been using reaper to do repairs, but it has hung.  I tried to run:
> nodetool repair -pr
> on each of the nodes, but they all fail with some form of this error:
>
> error: Repair job has failed with the error message: Repair command #521
> failed with error Did not get replies from all endpoints.. Check the
> logs on the repair participants for further details
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error
> message: Repair command #521 failed with error Did not get replies from
> all endpoints.. Check the logs on the repair participants for further
> details
>          at
> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
>          at
>
> org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
>          at
>
> java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
>          at
>
> java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
>          at
>
> java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
>          at
>
> java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
>          at java.base/java.lang.Thread.run(Thread.java:829)
>
> Using version 4.1.2-1
> nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load        Tokens  Owns  Host
> ID                               Rack
> UN  172.16.100.45   505.66 GiB  250     ?
> 07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
> UN  172.16.100.251  380.75 GiB  200     ?
> 274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
> UN  172.16.100.35   479.2 GiB   200     ?
> 59150c47-274a-46fb-9d5e-bed468d36797  rack1
> UN  172.16.100.252  248.69 GiB  200     ?
> 8f0d392f-0750-44e2-91a5-b30708ade8e4  rack1
> UN  172.16.100.249  411.53 GiB  200     ?
> 49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
> UN  172.16.100.38   333.26 GiB  200     ?
> 0d9509cc-2f23-4117-a883-469a1be54baf  rack1
> UN  172.16.100.36   405.33 GiB  200     ?
> d9702f96-256e-45ae-8e12-69a42712be50  rack1
> UN  172.16.100.39   437.74 GiB  200     ?
> 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
> UN  172.16.100.248  344.4 GiB   200     ?
> 4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
> UN  172.16.100.44   409.36 GiB  200     ?
> b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
> UN  172.16.100.37   236.08 GiB  120     ?
> 08a19658-40be-4e55-8709-812b3d4ac750  rack1
> UN  172.16.20.16    975 GiB     500     ?
> 1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1
> UN  172.16.100.34   340.77 GiB  200     ?
> 352fd049-32f8-4be8-9275-68b145ac2832  rack1
> UN  172.16.100.42   974.86 GiB  500     ?
> b088a8e6-42f3-4331-a583-47ef5149598f  rack1
>
> Note: Non-system keyspaces don't have the same replication settings,
> effective ownership information is meaningless
>
> Debug log has:
>
>
> DEBUG [ScheduledTasks:1] 2023-08-04 11:56:04,955
> MigrationCoordinator.java:264 - Pulling unreceived schema versions...
> INFO  [HintsDispatcher:11344] 2023-08-04 11:56:21,369
> HintsDispatchExecutor.java:318 - Finished hinted handoff of file
> 1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297-1690426370160-2.hints to endpoint
> /172.16.20.16:7000: 1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297, partially
> WARN
> [Messaging-OUT-/172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES]
> 2023-08-04 11:56:21,916 OutboundConnection.java:491 -
> /172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES-[no-channel]
> dropping message of type HINT_REQ due to error
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The
> channel this output stream was writing to has been closed
>          at
> org.apache.cassandra.net
> .AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>          at
> org.apache.cassandra.net
> .AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>          at
> org.apache.cassandra.net
> .AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>          at
> org.apache.cassandra.net
> .AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>          at
> org.apache.cassandra.net
> .AsyncMessageOutputPlus.doFlush(AsyncMessageOutputPlus.java:100)
>          at
>
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:122)
>          at
>
> org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:139)
>          at
>
> org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:77)
>          at
> org.apache.cassandra.net
> .Message$Serializer.serializePost40(Message.java:844)
>          at
> org.apache.cassandra.net.Message$Serializer.serialize(Message.java:702)
>          at
> org.apache.cassandra.net
> .OutboundConnection$LargeMessageDelivery.doRun(OutboundConnection.java:984)
>          at
> org.apache.cassandra.net
> .OutboundConnection$Delivery.run(OutboundConnection.java:690)
>          at
> org.apache.cassandra.net
> .OutboundConnection$LargeMessageDelivery.run(OutboundConnection.java:958)
>          at
>
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:124)
>          at
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>          at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>          at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>          at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: io.netty.channel.unix.Errors$NativeIoException:
> writeAddress(..) failed: Connection timed out
> INFO  [Messaging-EventLoop-3-16] 2023-08-04 11:56:21,918
> OutboundConnection.java:1153 -
> /172.16.100.34:7000(/172.16.100.34:59198)->/172.16.20.16:7000-LARGE_MESSAGES-2fc2c5b9
>
> successfully connected, version = 12, framing = CRC, encryption =
> unencrypted
> ERROR [Repair-Task:437] 2023-08-04 11:56:28,592 RepairRunnable.java:160
> - Repair 30675c00-32df-11ee-a7d8-05183c68b0d0 failed:
> java.lang.RuntimeException: Did not get replies from all endpoints.
>          at
>
> org.apache.cassandra.service.ActiveRepairService.failRepair(ActiveRepairService.java:721)
>          at
>
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:654)
>          at
> org.apache.cassandra.repair.RepairRunnable.prepare(RepairRunnable.java:400)
>          at
>
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:279)
>          at
> org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:248)
>          at
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
>          at
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
>          at
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
>          at
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
>          at
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
>          at
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
>          at
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>          at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>          at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>          at java.base/java.lang.Thread.run(Thread.java:829)
> INFO  [Repair-Task:437] 2023-08-04 11:56:28,594 RepairRunnable.java:223
> - [repair #30675c00-32df-11ee-a7d8-05183c68b0d0]Repair command #522
> finished with error
>
> What to do?
> Thanks!
>
> -Joe
>
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com
>

Reply via email to