Finally found a message on another node that seem relevant:

INFO  [CompactionExecutor:7413] 2023-08-11 11:36:22,397 CompactionTask.java:164 - Compacting (d30b64ba-385c-11ee-8e74-edf5512ad115) [/data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-97958-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-91664-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-90239-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-99385-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-101078-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-86112-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-90753-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-53333-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-94008-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-92338-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-87273-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-82398-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-94244-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-80384-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-65431-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-90412-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-90104-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-85155-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-92914-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-78344-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-53269-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-99242-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-73898-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-100473-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-76035-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-101352-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-62093-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-93643-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-97812-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-73062-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-65491-big-Data.db:level=0, /data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-93299-big-Data.db:level=0, ] DEBUG [CompactionExecutor:7412] 2023-08-11 11:36:22,398 Directories.java:502 - DataDirectory /data/7/cassandra/data has 91947520000 bytes available, checking if we can write 10716461 bytes INFO  [CompactionExecutor:7412] 2023-08-11 11:36:22,398 CompactionTask.java:164 - Compacting (d30b64b0-385c-11ee-8e74-edf5512ad115) [/data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-36867-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32270-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32287-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-30785-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32545-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-38791-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-38586-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-36849-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-39083-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-16383-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-17443-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-30587-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-38815-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32235-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-38817-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-19013-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32326-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32827-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-39106-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-42758-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-32428-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-39653-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-16889-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-18940-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-41236-big-Data.db:level=0, /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-36654-big-Data.db:level=0, ] INFO  [CompactionExecutor:7412] 2023-08-11 11:36:22,398 NoSpamLogger.java:105 - Maximum memory usage reached (512.000MiB) for chunk-cache buffer pool, cannot allocate chunk of 8.000MiB ERROR [CompactionExecutor:7412] 2023-08-11 11:36:23,109 JVMStabilityInspector.java:68 - Exception in thread Thread[CompactionExecutor:7412,5,CompactionExecutor] org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db         at org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:229)         at org.apache.cassandra.io.util.BufferManagingRebufferer.rebuffer(BufferManagingRebufferer.java:79)         at org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:67)         at org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:61)         at org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:90)         at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)         at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:62)         at org.apache.cassandra.db.marshal.ByteArrayAccessor.read(ByteArrayAccessor.java:103)         at org.apache.cassandra.db.marshal.ByteArrayAccessor.read(ByteArrayAccessor.java:40)         at org.apache.cassandra.db.marshal.AbstractType.read(AbstractType.java:530)         at org.apache.cassandra.db.marshal.AbstractType.readArray(AbstractType.java:510)         at org.apache.cassandra.db.ClusteringPrefix$Serializer.deserializeValuesWithoutSize(ClusteringPrefix.java:441)         at org.apache.cassandra.db.Clustering$Serializer.deserialize(Clustering.java:165)         at org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeOne(UnfilteredSerializer.java:478)         at org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(UnfilteredSerializer.java:435)         at org.apache.cassandra.io.sstable.SSTableSimpleIterator$CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:84)         at org.apache.cassandra.io.sstable.SSTableSimpleIterator$CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:62)         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:126)         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)         at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:376)         at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:188)         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:157)         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:523)         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:391)         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)         at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)         at org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)         at org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:75)         at org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)         at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)         at org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:275)         at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:203)         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:82)         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:359)         at org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:113)         at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)         at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)         at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db): corruption detected, chunk at 604552 of length 7911.         at org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:221)
        ... 46 common frames omitted
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db): corruption detected, chunk at 604552 of length 7911.         at org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:209)

Ideas?

-Joe


On 8/7/2023 10:27 PM, manish khandelwal wrote:
What logs of /172.16.20.16:7000 <http://172.16.20.16:7000/> say when repair failed. It indicates "validation failed". Can you check system.log for /172.16.20.16:7000 <http://172.16.20.16:7000/> and see what they say. Looks like you have some issue with *doc/origdoc, probably some corrupt sstable. *Try to run repair for individual table and see for which table repair fails.

Regards
Manish

On Mon, Aug 7, 2023 at 11:39 PM Joe Obernberger <joseph.obernber...@gmail.com> wrote:

    Thank you.  I've tried:
    nodetool repair --full
    nodetool repair -pr
    They all get to 57% on any of the nodes, and then fail.
    Interestingly the debug log only has INFO - there are no errors.

    [2023-08-07 14:02:09,828] Repair command #6 failed with error
    Incremental repair session 83dc17d0-354c-11ee-809c-177460b0ed52
    has failed
    [2023-08-07 14:02:09,830] Repair command #6 finished with error
    error: Repair job has failed with the error message: Repair
    command #6 failed with error Incremental repair session
    83dc17d0-354c-11ee-809c-177460b0ed52 has failed. Check the logs on
    the repair participants for further details
    -- StackTrace --
    java.lang.RuntimeException: Repair job has failed with the error
    message: Repair command #6 failed with error Incremental repair
    session 83dc17d0-354c-11ee-809c-177460b0ed52 has failed. Check the
    logs on the repair participants for further details
            at
    org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
            at
    
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
            at java.base/java.lang.Thread.run(Thread.java:829)

    Full repair results on another node:


    [2023-08-04 20:21:42,575] Repair session
    14830280-3304-11ee-939d-635768ac938c for range
    [(-5756366402057257951,-5754159509763216479],
    (-2469484655657848961,-2461953651636879320],
    (-5175468354897450191,-5171107677178073434],
    (-628587988891618162,-624346074440106568],
    (-6615381309032691143,-6603240846496048854],
    (6616005974054228159,6628798414170514490],
    (8013321283688199900,8017115978405113835],
    (-7829682363035100161,-7824999966028871477],
    (2848484090138352114,2852114415040125826],
    (-2477015659678818602,-2469484655657848961],
    (-2483470805982506865,-2477015659678818602]] finished (progress: 57%)
    [2023-08-04 20:36:23,786] Repair session
    14cbcb50-3304-11ee-939d-635768ac938c for range
    [(5193761311910499374,5197212898580538329],
    (-1679246469353274066,-1672836360726470435],
    (-6927245454058012407,-6922951496140109663],
    (1851771008808005661,1854683726231521039],
    (5197212898580538329,5200664485250577285],
    (1848858291384490283,1851771008808005661],
    (-4736378492502250338,-4732073287189625685],
    (-2705389975640427939,-2699099608948332293],
    (-7806270378003956741,-7796905583991499373],
    (466064862768270626,473304202405656261],
    (250549667892224144,253421473349298265],
    (-6922951496140109663,-6920804517181158291],
    (249113765163687083,250549667892224144],
    (1854683726231521039,1857596443655036418],
    (4687110928509362134,4694325991399541085],
    (-6920804517181158291,-6918657538222206919],
    (4399045818626652943,4402968741621424236],
    (473304202405656261,480543542043041896]] finished (progress: 57%)
    [2023-08-04 20:36:23,795] Repair command #12 finished with error
    error: Repair job has failed with the error message: Repair
    command #12 failed with error Repair session
    154f5330-3304-11ee-939d-635768ac938c for range
    [(5333449259855342357,5338449508113440752],
    (4959134492108085445,4965331080956982133],
    (5938148666505886222,5945280202710590417],
    (8428867157147807368,8431880058869458408],
    (5338449508113440752,5343449756371539147]] failed with error
    [repair #154f5330-3304-11ee-939d-635768ac938c on doc/origdoc,
    [(5333449259855342357,5338449508113440752],
    (4959134492108085445,4965331080956982133],
    (5938148666505886222,5945280202710590417],
    (8428867157147807368,8431880058869458408],
    (5338449508113440752,5343449756371539147]]] Validation failed in
    /172.16.20.16:7000 <http://172.16.20.16:7000>. Check the logs on
    the repair participants for further details
    -- StackTrace --
    java.lang.RuntimeException: Repair job has failed with the error
    message: Repair command #12 failed with error Repair session
    154f5330-3304-11ee-939d-635768ac938c for range
    [(5333449259855342357,5338449508113440752],
    (4959134492108085445,4965331080956982133],
    (5938148666505886222,5945280202710590417],
    (8428867157147807368,8431880058869458408],
    (5338449508113440752,5343449756371539147]] failed with error
    [repair #154f5330-3304-11ee-939d-635768ac938c on doc/origdoc,
    [(5333449259855342357,5338449508113440752],
    (4959134492108085445,4965331080956982133],
    (5938148666505886222,5945280202710590417],
    (8428867157147807368,8431880058869458408],
    (5338449508113440752,5343449756371539147]]] Validation failed in
    /172.16.20.16:7000 <http://172.16.20.16:7000>. Check the logs on
    the repair participants for further details
            at
    org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
            at
    
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
            at
    
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
            at java.base/java.lang.Thread.run(Thread.java:829)

    I'm not sure what to do next?

    -Joe

    On 8/6/2023 8:58 AM, Josh McKenzie wrote:
    Quick drive-by observation:
    Did not get replies from all endpoints.. Check the
    logs on the repair participants for further details

        dropping message of type HINT_REQ due to error
        org.apache.cassandra.net
        <http://org.apache.cassandra.net>.AsyncChannelOutputPlus$FlushException:
        The
        channel this output stream was writing to has been closed


        Caused by: io.netty.channel.unix.Errors$NativeIoException:
        writeAddress(..) failed: Connection timed out


    java.lang.RuntimeException: Did not get replies from all endpoints.
    These all point to the same shaped problem: for whatever reason,
    the coordinator of this repair didn't receive replies from the
    replicas executing it. Could be that they're dead, could be they
    took too long, could be they never got the start message, etc.
    Distributed operations are tricky like that.

    Logs on the replicas doing the actual repairs should give you
    more insight; this is a pretty low level generic set of errors
    that basically amounts to "we didn't hear back from the other
    participants in time so we timed out."

    On Fri, Aug 4, 2023, at 12:02 PM, Surbhi Gupta wrote:
    Can you please try to do nodetool describecluster from every
    node of the cluster?

    One time I noticed issue when nodetool status shows all nodes UN
    but describecluster was not.

    Thanks
    Surbhi

    On Fri, Aug 4, 2023 at 8:59 AM Joe Obernberger
    <joseph.obernber...@gmail.com> wrote:

        Hi All - been using reaper to do repairs, but it has hung. 
        I tried to run:
        nodetool repair -pr
        on each of the nodes, but they all fail with some form of
        this error:

        error: Repair job has failed with the error message: Repair
        command #521
        failed with error Did not get replies from all endpoints..
        Check the
        logs on the repair participants for further details
        -- StackTrace --
        java.lang.RuntimeException: Repair job has failed with the
        error
        message: Repair command #521 failed with error Did not get
        replies from
        all endpoints.. Check the logs on the repair participants
        for further
        details
                 at
        org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
                 at
        
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
                 at
        
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
                 at
        
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
                 at
        
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
                 at
        
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
                 at java.base/java.lang.Thread.run(Thread.java:829)

        Using version 4.1.2-1
        nodetool status
        Datacenter: datacenter1
        =======================
        Status=Up/Down
        |/ State=Normal/Leaving/Joining/Moving
        --  Address         Load        Tokens Owns  Host
        ID                               Rack
        UN  172.16.100.45   505.66 GiB  250     ?
        07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
        UN  172.16.100.251  380.75 GiB  200     ?
        274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
        UN  172.16.100.35   479.2 GiB   200     ?
        59150c47-274a-46fb-9d5e-bed468d36797  rack1
        UN  172.16.100.252  248.69 GiB  200     ?
        8f0d392f-0750-44e2-91a5-b30708ade8e4  rack1
        UN  172.16.100.249  411.53 GiB  200     ?
        49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
        UN  172.16.100.38   333.26 GiB  200     ?
        0d9509cc-2f23-4117-a883-469a1be54baf  rack1
        UN  172.16.100.36   405.33 GiB  200     ?
        d9702f96-256e-45ae-8e12-69a42712be50  rack1
        UN  172.16.100.39   437.74 GiB  200     ?
        93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
        UN  172.16.100.248  344.4 GiB   200     ?
        4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
        UN  172.16.100.44   409.36 GiB  200     ?
        b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
        UN  172.16.100.37   236.08 GiB  120     ?
        08a19658-40be-4e55-8709-812b3d4ac750  rack1
        UN  172.16.20.16    975 GiB     500     ?
        1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1
        UN  172.16.100.34   340.77 GiB  200     ?
        352fd049-32f8-4be8-9275-68b145ac2832  rack1
        UN  172.16.100.42   974.86 GiB  500     ?
        b088a8e6-42f3-4331-a583-47ef5149598f  rack1

        Note: Non-system keyspaces don't have the same replication
        settings,
        effective ownership information is meaningless

        Debug log has:


        DEBUG [ScheduledTasks:1] 2023-08-04 11:56:04,955
        MigrationCoordinator.java:264 - Pulling unreceived schema
        versions...
        INFO  [HintsDispatcher:11344] 2023-08-04 11:56:21,369
        HintsDispatchExecutor.java:318 - Finished hinted handoff of
        file
        1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297-1690426370160-2.hints
        to endpoint
        /172.16.20.16:7000 <http://172.16.20.16:7000>:
        1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297, partially
        WARN
        [Messaging-OUT-/172.16.100.34
        <http://172.16.100.34>:7000->/172.16.20.16:7000-LARGE_MESSAGES]
        2023-08-04 11:56:21,916 OutboundConnection.java:491 -
        /172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES-[no-channel]

        dropping message of type HINT_REQ due to error
        org.apache.cassandra.net
        <http://org.apache.cassandra.net>.AsyncChannelOutputPlus$FlushException:
        The
        channel this output stream was writing to has been closed
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.AsyncMessageOutputPlus.doFlush(AsyncMessageOutputPlus.java:100)
                 at
        
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:122)
                 at
        
org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:139)
                 at
        
org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:77)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.Message$Serializer.serializePost40(Message.java:844)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.Message$Serializer.serialize(Message.java:702)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.OutboundConnection$LargeMessageDelivery.doRun(OutboundConnection.java:984)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.OutboundConnection$Delivery.run(OutboundConnection.java:690)
                 at
        org.apache.cassandra.net
        
<http://org.apache.cassandra.net>.OutboundConnection$LargeMessageDelivery.run(OutboundConnection.java:958)
                 at
        
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:124)
                 at
        
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
                 at
        
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
                 at
        
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
                 at java.base/java.lang.Thread.run(Thread.java:829)
        Caused by: io.netty.channel.unix.Errors$NativeIoException:
        writeAddress(..) failed: Connection timed out
        INFO  [Messaging-EventLoop-3-16] 2023-08-04 11:56:21,918
        OutboundConnection.java:1153 -
        /172.16.100.34:7000(/172.16.100.34:59198)->/172.16.20.16
        <http://172.16.20.16>:7000-LARGE_MESSAGES-2fc2c5b9
        successfully connected, version = 12, framing = CRC,
        encryption =
        unencrypted
        ERROR [Repair-Task:437] 2023-08-04 11:56:28,592
        RepairRunnable.java:160
        - Repair 30675c00-32df-11ee-a7d8-05183c68b0d0 failed:
        java.lang.RuntimeException: Did not get replies from all
        endpoints.
                 at
        
org.apache.cassandra.service.ActiveRepairService.failRepair(ActiveRepairService.java:721)
                 at
        
org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:654)
                 at
        
org.apache.cassandra.repair.RepairRunnable.prepare(RepairRunnable.java:400)
                 at
        
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:279)
                 at
        org.apache.cassandra.repair.RepairRunnable.run(RepairRunnable.java:248)
                 at
        org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
                 at
        org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
                 at
        org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
                 at
        org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
                 at
        org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
                 at
        org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
                 at
        
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
                 at
        
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
                 at
        
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
                 at java.base/java.lang.Thread.run(Thread.java:829)
        INFO  [Repair-Task:437] 2023-08-04 11:56:28,594
        RepairRunnable.java:223
        - [repair #30675c00-32df-11ee-a7d8-05183c68b0d0]Repair
        command #522
        finished with error

        What to do?
        Thanks!

        -Joe


-- This email has been checked for viruses by AVG antivirus
        software.
        www.avg.com <http://www.avg.com>



    
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
        Virus-free.www.avg.com
    
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>


    <#m_266557648173100484_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Reply via email to